Re: location of UnicodeData.txt

David Starner Mon, 02 Dec 2002 21:15:56 -0600

> But they clearly do not want you to modify anything, including
> character name!  Character name is a searchable field, which some
> applications may need.


It's an English field, for which there is a canonical translation
for French, and there should be translation for other languages.

> The only overlap with any previous character coding is the first 127
> characters (ASCII).

Nope. There's massive overlap with previous character codings on 
all sorts of levels. The first 256 characters are Latin-1; the 
Greek block is a superset of ISO-8859-7 (that is, the characters 
are in the same order, but some of the gaps have been filled in), 
as is Cyrillic and Arabic for their respective 8859 standard. All 
the Indian blocks are weird echos of ISCII. The basic CJK block is 
the ideographs from the preexisting Chinese, Japanese and Korean 
standards, sorted by the order of traditional dictionaries like the 
KangXi.

> If a system simply declared a section of data to be
> UniCode data, and made no attempt to comprehend the contents, it
> probably would not need to have access to the contents of Unicode.txt.

Just like if a system simply declared a section of data to be
code complaint to Fortran-2026, and if it made no attempt to
comprehend it, it wouldn't need access to the contents of that
standard. A text-processing program that needs to display data is 
going to need the contents of UnicodeData for BiDi. A proper
cut program should use UnicodeData, so it doesn't seperate a 
character from a subsequent combining character. A spell program 
is going to need the data to know which characters end words. 
Anything that handles text in a way more complex then cat will
access to this data.

______________________________________________________________________
Do you want a free e-mail for life ? Get it at http://www.personal.ro/

Re: location of UnicodeData.txt

Reply via email to