> But they clearly do not want you to modify anything, including > character name! Character name is a searchable field, which some > applications may need.
It's an English field, for which there is a canonical translation for French, and there should be translation for other languages. > The only overlap with any previous character coding is the first 127 > characters (ASCII). Nope. There's massive overlap with previous character codings on all sorts of levels. The first 256 characters are Latin-1; the Greek block is a superset of ISO-8859-7 (that is, the characters are in the same order, but some of the gaps have been filled in), as is Cyrillic and Arabic for their respective 8859 standard. All the Indian blocks are weird echos of ISCII. The basic CJK block is the ideographs from the preexisting Chinese, Japanese and Korean standards, sorted by the order of traditional dictionaries like the KangXi. > If a system simply declared a section of data to be > UniCode data, and made no attempt to comprehend the contents, it > probably would not need to have access to the contents of Unicode.txt. Just like if a system simply declared a section of data to be code complaint to Fortran-2026, and if it made no attempt to comprehend it, it wouldn't need access to the contents of that standard. A text-processing program that needs to display data is going to need the contents of UnicodeData for BiDi. A proper cut program should use UnicodeData, so it doesn't seperate a character from a subsequent combining character. A spell program is going to need the data to know which characters end words. Anything that handles text in a way more complex then cat will access to this data. ______________________________________________________________________ Do you want a free e-mail for life ? Get it at http://www.personal.ro/