On 18 Oct 2008, at 16:32, Zaher Dirkey wrote:

I notice UTF8 from Delphi not Compatible with Lazarus/FPC and vise versa.
It Corrupt the characters.

When you set the encoding of an FPC source file to UTF-8 (either by adding a BOM or by using {$codepage utf-8}), then a) all constant strings containing utf-8 characters will be decoded and converted to utf-16 (widestring) b) at run time, these widestrings will again be converted to the active code page when assigning them to ansistrings or shortstrings

If a file does not contain a BOM nor a {$codepage yyy} statement, then constant strings are not parsed in anyway and will be appear literally in the compiled program (and when assigning them to a widestring, they will be "converted" from ansi to utf-16 and hence contain garbage at the end).

This means that if you (ab)use ansistrings to store utf-8 strings (rather than strings in the current ansi-encoding), you either have to
a) not use a bom/codepage, or
b) use UTF8Encode(widestringconstant)

Otherwise characters not representable using the active code page will disappear during the run time conversion from widestring to ansistring (and you won't end up with an utf-8 string in any case).


Jonas
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to