On piątek 20 grudzień 2002 10:01 am, Philipp Reichmuth wrote:
> LGB> | Sorry, I don't understand. The length of the string U+0065 U+0301
> LGB> | certainly is 2, regardless of how the rendering engine displays
> this. LGB> | Of course, the rendering engine should render it as "é"
> because U+0301 LGB> | is a combining character, but the string length is
> still 2.
>
> LGB> Not if I want to count the number of characters in the document.
>
> That's true. The problem is that you probably need the "real" string length
> information anyway for string operations. So if there's a need for a
> figure for number of characters in the document excluding combining
> characters, but including some combining characters (like Arabic or
> Hebrew vowels), one will need a secondary function.

Gotcha ;-)

Actually the units of such "layman's string length" could be called "layman's 
characters" and could be used instead of all the unicode stuff.

A table mapping "layman's characters" to sequences of unicode characters could 
be built at runtime, and could be pretty useful methinks.

Cheers, Kuba Ober

Reply via email to