Michael Schnell wrote:
It will at best be "friendly old school behaviour which works most of
the time, but which fails as soon as the strings are not completely
normalised because then you can have decomposed characters and
whatnot" (which in turn easily leads to security holes due to
incomplete checks, hard to reproduce bugs and "write once, debug
everywhere"-style behaviour).
Sorry, I don't understand. What not normalized behavior needs to be
taken into account ?
Remember that an individual code point does not nessacerally represent
what a user would consider a character. Indeed one character may be
representable in more than one way (either as a precomposed character or
a sequence of base character and combining diacritic). And even if we
ignore combining diacritics the number of console positions a string
takes is not nessacerally equal to the code point either since many CJK
characters take two console positions.
Given theese facts code point counts and indexes are not much more
usefull than code unit indexes and counts.
And if you need something better than either code point count or code
unit count then you have little choice but to pull in an external
library. Pulling in an external library with a relatively unstable
interface is not something the compiler or RTL should be doing IMO.
_______________________________________________
fpc-devel maillist - [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel