On 2008-11-23 14:49, Daniël Mantione wrote:
Op Sun, 23 Nov 2008, schreef Jonas Maebe:
On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
For an IDE, this is a little bit more complicated. I.e. searching for
a ç in a source file needs to find both the composed and the
decomposed variant, and in the case of UTF-8, this character can be
encoded in 1, 2, 3 or 4 bytes which all need to be found. This is
where UTF-16 and UTF-32 start to make sense.
Characters can also be decomposed in UTF-16 and in UTF-32 (for the
same reasons as in UTF-8).
I am aware of that, but the combining cedille is not in the "easy to
process range" of UTF-8. In other words, you cannot do
"if char[i]=combining_cedille" in UTF-8.
Instead UTF-8, you need to make sure the string has enough characters
left, and then compare multiple characters. Heck, you even need to take
care of the fact the the combining cedille can be encoded in 2, 3 or 4
bytes.
This is one of the million and one small details that one has to keep in
mind while programming.
What I think would more sensible is that, instead of using all these
variable sizes and all, simply use 4-byte/char strings and compose (in
UTF sense) everything into that string.
You do this once, when importing/loading text to your app. And, then on,
everthing is just like the good old string --except that it is a 4-byte
per char string, instead of 1-byte.
Now, my question is this: How would I create a 'FourByteString' type,
reference counted etc. just like the usual 'String'?
How hard is it?
Can someone like me, who does nor speak assembler, do it?
If so, where do I begin copy&pasting from 'string'?
_______________________________________________
fpc-devel maillist - [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel