Re: [fpc-devel] Unicode support in RTL - Roadmap

Jonas Maebe Fri, 21 Nov 2008 07:56:33 -0800


On 21 Nov 2008, at 16:16, Michael Schnell wrote:

So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü')would be 1.
Or 2, depending on whether it's predcomposed or decomposed.
I seem to remember that we discussed this some time ago and theresult was that the compose (MAC style ?)

Decomposed and precomposed have nothing to do with Windows vs Mac OS Xvs Linux vs whatever. They are both equally valid ways to representUTF strings and both have their uses (on all platforms). All programsshould also be prepared to deal with them, since you never know whatkind of input you will get.

characters in fact are a single code point (Unicode character) thatconsists of two (maybe more ? ) complete code points that are tiedtogether by some special coding, so IMHO it can be considered as asingle Unicode character in both cases. If this would result in ahuge table of possibly composed characters I thing we would stick tothe concept of providing a decent functionality and restrict onthose that are currently used by the "customers" we normally address(Mac in Europe and America).

I think you are talking about a different "we". Further, inventing ourown meanings of what a "code point" or "unicode character" means is anextremely bad idea (you'd also have to rename UTF*Point* routines toUTF*FPCLikeChar* so they properly indicate the fact that they do notdeal with code points). UTF by itself already has enough variations todeal with, we will not add our own.

which does not make sense if UTF8PointLength(utfstring_1) issmaller than UTF8PointLength(utfstring_2).
It does not make any sense under any circumstances, because thereis no way for "UTF8PointSetLength" to know how many bytes it has toallocate when you pass a value (any value, regardless of where itcomes from) to it.
If UTF8PointLength(utfstring_1) is greater thanUTF8PointLength(utfstring_2) no new bytes need to be allocated
but the function is just equivalent to
utfstring1 := UTF8PointCopy(utfstring1, 1,UTF8PointLength(utfstring_2));
To me this does not seem to impose any problem.

Except if the point is to reserve exactly enough space for utfstring1and to overwrite its contents with something else afterwards (usingmove() or whatever). That's a very common use of setlength (at leastin the FPC run time library, and I guess elsewhere as well). The factthat it also doesn't work if the string has to be made longer isbasically the same problem.

Your system just does not work, and the more examples you give themore it falls down, as far as I can see. Please first write a wikipage explaining how to deal with all cases, or at least noting whichcases will not work. Only then it is possible to decide on whether ornot it is both feasible and worthwhile to go through the trouble ofimplementing all this. Without it, I feel I am mainly wasting my timewriting these mails because it seems you haven't thought it throughyet at all.



Jonas_______________________________________________
fpc-devel maillist  -  [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode support in RTL - Roadmap

Reply via email to