Michael Schnell wrote:


It will at best be "friendly old school behaviour which works most of the time, but which fails as soon as the strings are not completely normalised because then you can have decomposed characters and whatnot" (which in turn easily leads to security holes due to incomplete checks, hard to reproduce bugs and "write once, debug everywhere"-style behaviour).
Sorry, I don't understand. What not normalized behavior needs to be taken into account ?
Remember that an individual code point does not nessacerally represent what a user would consider a character. Indeed one character may be representable in more than one way (either as a precomposed character or a sequence of base character and combining diacritic). And even if we ignore combining diacritics the number of console positions a string takes is not nessacerally equal to the code point either since many CJK characters take two console positions.

Given theese facts code point counts and indexes are not much more usefull than code unit indexes and counts.

And if you need something better than either code point count or code unit count then you have little choice but to pull in an external library. Pulling in an external library with a relatively unstable interface is not something the compiler or RTL should be doing IMO.

_______________________________________________
fpc-devel maillist  -  [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to