On Sun, 23 Nov 2008 13:49:32 +0100 (CET) Daniël Mantione <[EMAIL PROTECTED]> wrote:
> > > Op Sun, 23 Nov 2008, schreef Jonas Maebe: > > > > > On 23 Nov 2008, at 13:31, Daniël Mantione wrote: > > > >> For an IDE, this is a little bit more complicated. I.e. searching > >> for a ç in a source file needs to find both the composed and the > >> decomposed variant, and in the case of UTF-8, this character can > >> be encoded in 1, 2, 3 or 4 bytes which all need to be found. This > >> is where UTF-16 and UTF-32 start to make sense. > > > > Characters can also be decomposed in UTF-16 and in UTF-32 (for the > > same reasons as in UTF-8). > > I am aware of that, but the combining cedille is not in the "easy to > process range" of UTF-8. In other words, you cannot do > "if char[i]=combining_cedille" in UTF-8. > > Instead UTF-8, you need to make sure the string has enough characters > left, and then compare multiple characters. Heck, you even need to > take care of the fact the the combining cedille can be encoded in 2, > 3 or 4 bytes. Which means that there are three different unicode codes for this character, which means a single if-equal does not work in UTF-16 or UTF32 too. if UTF8CharacterToUnicode(@s[i],CharLen) in [cedille1,cedille2,cedille3] then Mattias _______________________________________________ fpc-devel maillist - [email protected] http://lists.freepascal.org/mailman/listinfo/fpc-devel
