On Thu, 20 Sep 2012 18:09:03 -0500 Naena Guru <[email protected]> wrote:
> Statements like, > > Using Unicode is recommended in preference to any code page because > it has better language support and is less ambiguous than any of the > code pages. > > are trying to assert untruths, that people tend to believe without > concrete reasons. 'better language support' and 'less ambiguous'? With anything but Windows-1252, the language support is likely to be made available via Unicode. The removal of ambiguity comes from two fronts: 1) Some ASCII characters are overworked, and have been split into separate characters in Unicode. 2) Tagging of 'plain text' is fairly poor. > That statement is by Microsoft right in the registration of > Windows-1252 that plainly contravenes Unicode: > http://msdn.microsoft.com/en-US/goglobal/cc305145.aspx > > All languages in the Developed countries in the West including > English, use Windows-1252! Actually, I think Wales counts as a developed region. Windows-1252 does not support accents on 'w'. Presumably you are treating the dots above in Irish as irrelevant because the use of 'h' has largely replaced them. I presume you are unimpressed by the fact that Latin as written in my school textbooks could not be written in Windows-1252 - the vowels with macron and breve are unsupported by it! Nowadays, I usually use minus signs (U+2212), which is not in Windows-1252, for negative numbers in text in Unicode-capable systems. It gives better results than hyphen-minus, both visually, and for line breaking. I also get better cutting and pasting of single Greek letters if they are entered as characters rather than symbols. Oddly enough, I hardly notice the absence of an ohm symbol. > I agree that following ISCII, whatever it is, might be the problem. ISCII = Indian Stand Code for Information Interchange. > It is presumptuous to say, "the rest of the post is irrelevant". It was irrelevant to compiling a list of Semitic transliteration characters. Semitic transliteration characters have the advantage of being in the Latin script, which in general behaves as programmers used to the Latin script expect. There was, though, an Egyptian transliteration character that gave grief, because of subtle differences in behaviour between a Greek and a Latin diacritic that has been unified. The solution was to declare the diacritic to be the Cyrillic version of the diacritic, because it had not been unified with the other two, so the character was decreed to be <U+0069 LATIN SMALL LETTER I, U+0486 COMBINING CYRILLIC PSILI PNEUMATA>. Richard.

