Graeme Geldenhuys wrote on Wed, 11 May 2016:

In my application I enable unicodestring mode. So I'm reading data from
a Firebird database. The data is stored as UTF-8 in a VarChar field. The
DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
up with a default encoding of Latin-1.

So I read the UTF-8 data from the database, somewhere inside the SqlDB
code it gets assigned to a TField's String property. ie: UTF-8 ->
Latin-1 conversion.

This depends on how sqlDB is implemented, and I have absolutely no clue about that (other than what LacaK wrote).

As mentioned at http://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page , conversions on assignment only happen when the *declared* code page of the target string is different from that of the source string (other than the special case for RawByteString). So if sqlDB only uses plain String with {$h+} and/or AnsiString, then no conversions will happen anywhere in the scenario you describe since it will just assign ansistrings with declared code page CP_ACP to each other.

Then I read the field value into my application. ie: Latin-1 -> UTF-16

If sqlDB correctly sets the dynamic codepage of the strings it creates via SetCodePage(x,CP_UTF8,false), then when you assign those strings with declared codepage = CP_ACP and dynamic code page CP_UTF8 to your unicodestrings, they will be converted from UTF-8 to UTF-16 at that point.

If it does not set the dynamic code page of the strings it creates to the appropriate encoding, then you will indeed get data corruption at this point, because the UTF-8 encoded data will be interpreted as Latin-1 and then be "converted" to UTF-16.

For dealing with such code, which is not yet codepage-aware, by default the situation is no worse or no better than it was in previous FPC versions: exactly the same would happen there. However, in FPC 3.x you can generally fix it by changing the default code page for ansistrings using SetMultiByteConversionCodePage() to what you know/want to be the encoding of ansistrings, like Lazarus does.

All of this is moreover completely independent of {$modeswitch unicodestrings}, since that is just a shortcut to make String an alias for UnicodeString in the current compilation module (and Char for WideChar, and PChar for PWideChar).


Jonas
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to