On piątek 20 grudzień 2002 10:01 am, Philipp Reichmuth wrote: > LGB> | Sorry, I don't understand. The length of the string U+0065 U+0301 > LGB> | certainly is 2, regardless of how the rendering engine displays > this. LGB> | Of course, the rendering engine should render it as "é" > because U+0301 LGB> | is a combining character, but the string length is > still 2. > > LGB> Not if I want to count the number of characters in the document. > > That's true. The problem is that you probably need the "real" string length > information anyway for string operations. So if there's a need for a > figure for number of characters in the document excluding combining > characters, but including some combining characters (like Arabic or > Hebrew vowels), one will need a secondary function.
Gotcha ;-) Actually the units of such "layman's string length" could be called "layman's characters" and could be used instead of all the unicode stuff. A table mapping "layman's characters" to sequences of unicode characters could be built at runtime, and could be pretty useful methinks. Cheers, Kuba Ober