Graeme Geldenhuys wrote:
On 2016-05-09 17:40, Mark Morgan Lloyd wrote:> What, /exactly/, are you saying
can be lost, and under what circumstances?
You loose “data” due to codepage based AnsiString (aka the String type)not
always supporting all code points of UTF8String or UnicodeString data.
eg: I write a program that assigns a UnicodeString value to an AnsiString
variable. My program uses compiler mode OBJFPC and {$H+}. I run that same
executable on two different Linux systems. NOTE: it's the same executable.
Linux Box #1: The default codepage is UTF-8, thus String equals
AnsiString(65001). No data is lost when converting from UnicodeString to
String on this system. Essentially it’s a conversion of UTF-16 to UTF-8 -
both support the full Unicode range.
Linux Box #1: Here Linux has been setup with a default codepage of ISO-8859-1
(Latin 1). I have a UnicodeString variable which contains BMP and Planes 1-12 code
points. The program assigns that to my String type variable, which is actually
AnsiString(<latin1>). Only the first 255 characters of the 1.4 million
Unicode code points will be converted. All the others will be replaced by a '?'
symbol. A massive data loss, and that data could be critical.
What does FPC do about this? It only gives you a compiler warning whenthe
application was compiled, but still generates the executable as normal.
I now fully understand why Delphi 2009 and later uses UnicodeString asthe
default type and their String = UnicodeString = UTF-16. It defaultsto UTF-16 on
all its supported platforms (granted, Delphi support a lotless platforms than
FPC does). At least with Delphi it protects thedevelopers which still uses the
String type everywhere (remember String= UnicodeString there). Much safer than
what FPC 3.x does now!
Now some would say, simply switch your compiler mode to DelphiUnicode.But I
don't want to do that, because I like the stricter ObjFPC mode,and prefer
ObjFPC's syntax.
So which of these are you complaining about:
a) AnsiString doesn't support codepoints > 0xff ?
b) AnsiString doesn't support codepoints > 0x7f ?
c) AnsiString might apply an inappropriate translation for a codepoint
<= 0x7f ?
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk
[Opinions above are the author's, not those of his employers or colleagues]
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal