On 21 Nov 2008, at 16:16, Michael Schnell wrote:
So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü')
would be 1.
Or 2, depending on whether it's predcomposed or decomposed.
I seem to remember that we discussed this some time ago and the
result was that the compose (MAC style ?)
Decomposed and precomposed have nothing to do with Windows vs Mac OS X
vs Linux vs whatever. They are both equally valid ways to represent
UTF strings and both have their uses (on all platforms). All programs
should also be prepared to deal with them, since you never know what
kind of input you will get.
characters in fact are a single code point (Unicode character) that
consists of two (maybe more ? ) complete code points that are tied
together by some special coding, so IMHO it can be considered as a
single Unicode character in both cases. If this would result in a
huge table of possibly composed characters I thing we would stick to
the concept of providing a decent functionality and restrict on
those that are currently used by the "customers" we normally address
(Mac in Europe and America).
I think you are talking about a different "we". Further, inventing our
own meanings of what a "code point" or "unicode character" means is an
extremely bad idea (you'd also have to rename UTF*Point* routines to
UTF*FPCLikeChar* so they properly indicate the fact that they do not
deal with code points). UTF by itself already has enough variations to
deal with, we will not add our own.
which does not make sense if UTF8PointLength(utfstring_1) is
smaller than UTF8PointLength(utfstring_2).
It does not make any sense under any circumstances, because there
is no way for "UTF8PointSetLength" to know how many bytes it has to
allocate when you pass a value (any value, regardless of where it
comes from) to it.
If UTF8PointLength(utfstring_1) is greater than
UTF8PointLength(utfstring_2) no new bytes need to be allocated
but the function is just equivalent to
utfstring1 := UTF8PointCopy(utfstring1, 1,
UTF8PointLength(utfstring_2));
To me this does not seem to impose any problem.
Except if the point is to reserve exactly enough space for utfstring1
and to overwrite its contents with something else afterwards (using
move() or whatever). That's a very common use of setlength (at least
in the FPC run time library, and I guess elsewhere as well). The fact
that it also doesn't work if the string has to be made longer is
basically the same problem.
Your system just does not work, and the more examples you give the
more it falls down, as far as I can see. Please first write a wiki
page explaining how to deal with all cases, or at least noting which
cases will not work. Only then it is possible to decide on whether or
not it is both feasible and worthwhile to go through the trouble of
implementing all this. Without it, I feel I am mainly wasting my time
writing these mails because it seems you haven't thought it through
yet at all.
Jonas_______________________________________________
fpc-devel maillist - [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel