Daniël Mantione wrote:
>
> Op Mon, 7 May 2007, schreef Christos Chryssochoidis:
>
>> Daniël Mantione wrote:
>>> Not possible, a widestring is UCS-2/UTF-16.
>> I defined a widestring with 7 characters (code points), and the length() >> function returned the value 15. Of the 7 code points of that widestring only
>> one of them was greater than $07FF (the maximum code point which can be
>> encoded in 2 bytes under UTF-8). When I changed that character with another >> one with code not greater than $07FF, length() returned value 14... I also >> printed the byte values of one of the widestring's widechars, and the values
>> printed indicated UTF-8 encoding.
>
> Yes, the program output is utf-8 on OS-X, because this is the native
> encoding for OS-X. However, widestrings are not utf-8. Can you show your
> code?
>
> Daniël
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> fpc-pascal maillist - [EMAIL PROTECTED]
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal

OK, I figured out what happened. The source file was saved in UTF-8 encoding, but I hadn't put in my source file the compiler directive {$CODEPAGE UTF8}. After including this directive in my code almost everything worked fine: length() was returning the right number of unicode characters, and subscripting the widestring returned the right character. But as the widechar and widestring encoding is, as you said, UTF-16, while my Mac OS X console uses UTF-8 encoding, for the output results to be displayed correctly I had to wrap the individual widechars or the whole widestring with the function utf8encode(), prior to output them with write()...

Thanks for your help,

Christos

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to