TC 304
8-bit Character Sets - User Requirements
The ultimate user of equipment that makes use of coded character sets is
concerned with such matters as which languages it supports, whether page
layout information can be preserved during interchange of documents, and other
matters of a similar nature. This section of the guide is concerned with the
facilities available in character set standards to meet such requirements.
One of the prime requirements in the use of character sets is to be able to support the languages of concern to the user. A number of different
International Standards have been developed to provide multilingual support.
In addition, other languages are supported by the character sets of the International Register of Coded Character Sets to be used
with Escape Sequences.
This page provides an index to the various sections of the guide containing
information on the languages supported by such standards and register
entries.
- For variants of the ASCII 7-bit character set designed primarily to
support one individual language in a Latin script, see ISO/IEC 646. See the historical introduction to find out more about the origins of ASCII.
- For multilingual support obtained by supplementing ASCII to create an 8-bit code, see ISO/IEC 8859. This standard permits simultaneous support either for a range of languages in the Latin script, or for basic Latin letters (A-Z and a-z, but not þ, ø, etc. or accented letters) together with an alphabet in any one of the Greek, Cyrillic and other scripts.
- For the greatest support of languages in the Latin script that can be
achieved with a single 8-bit code, see ISO/IEC 6937.
This is achieved by the use of non-spacing diacritical marks, so permitting
more characters to be represented than there are positions in the code table.
This complication has its price; see the guidance on application environments.
- To achieve the level of support of Latin languages that is provided by ISO/IEC 6937 but without the use of non-spacing
diacritical marks, see ISO/IEC 10367. This is
achieved by the use of locking shifts. This complication also has its price;
again see the guidance on application environments.
- To support up to three of the four scripts Greek, Cyrillic, Hebrew and
Arabic simultaneously with basic use of the Latin alphabet, or up to two of
them simultaneously with a wide range of Latin languages, see ISO/IEC 10367. This is achieved by the use of both
locking shifts and non-spacing diacritical marks. The provisos mentioned in
both the preceding entries in this list then apply.
- To support Chinese, Japanese and Korean ideographic scripts, see the International Register.
It may be helpful to read the introduction to concepts
and terminology before following some of the above references.
Text that is communicated by the use of coded character sets is usually
intended ultimately for presentation on a screen or printed page. There is
a need to be able to communicate, with that data, information concerning the
layout of the text on the screen or page.
Such layout information may be either
- embedded in the communicated text, or
- separated from the text as elements of some communication protocol.
Two standards are available that provide coded control functions for embedding
layout information in communicated text.
- Control functions applicable to character-imaging devices in general are
specified in ISO/IEC 6429. This standard includes
facilities that permit texts in different scripts to he presented in opposite
directions, such as mixed Latin and Arabic, or Latin and Hebrew.
- Control functions intended specifically for the control of page layout are
specified in ISO/IEC 10538. This standard includes
facilities both for fixed format and for automatically reformattable text.
The former assumes that both sender and receiver have the same fonts
available. The latter permits reformatting where sender and receiver have
access to different fonts.
There are various published sources of guidance on the use of character set
standards in Europe. These include:
- Guidance on European character repertoires for use with 8-bit single-byte
coding, given in EN 1923.
- Guidance on the interchange of character data by means of the Telex
network in Europe, given in EN 1922.
- Guidance on the use of character sets in Europe in connection with OSI
Abstract Syntax Notation One (ASN.1), given in ISO/IEC
ISP 12070.
Top of 8-Bit Guide