Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

Jonas Maebe Wed, 11 May 2016 07:46:29 -0700


Andreas Dorn wrote on Wed, 11 May 2016:

All in all Graeme is right. FPC looks pretty much broken to me, too.
For my projects I pulled the emergency-break on anything FPC.
 
The most serious flaws for me of FPC 3.0 are:
- assuming that it's possible to assign an encoding to every string
- using an (unsafe) guess about the encoding for auto-conversions

Do you have code that works correctly in FPC 2.6.x, but not in FPC3.0? If so, can you please post it or file bug reports? Again: themain focus when designing all of this new functionality was backwardcompatibility: existing code that uses plainstring/shortstring/ansistring/unicodestring/char/widechar/unicodechar/pchar/pwidechar/punicodechar should have the same behaviour in FPC 3.0 as in previous FPC versions if you don't make any changes. And in virtually all cases it does (the utf8string type being a notableexception).

Some examples:
1) String-Buffers
Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an
encoding to
those chunks, and allowing auto-conversions will just lead to corruption.
 
Where is the string-type for string-buffers gone?

There never was any, but as long as you don't try to convert stringscontaining such arbitrary data from one code page to another (byeither calling setcodepage() or by assigning them from a string withdeclared code page X to a string with declared code page Y), noconversions will happen.

2) Most programming languages out there use something like "sequence of
UTF-16 codepoints" as a string-type.
(That's not the same as UTF-16 string !!!!!)
It's a proper string type for "UTF-16 buffer" - pretty much nobody out
there uses a low-level string-type that assumes
that the content is a complete UTF-16 string.

The meaning of UnicodeString has not changed in FPC 3.0 compared toprevious FPC versions, nor the way they are converted to/from otherstring types. You can argue it was broken from the start, but that'sunrelated to the present animosity that's getting vented about FPC 3.0.

 3) Filenames on Windows
You can't convert any random filename on Windows to UTF8 and back without
dataloss.
There simply isn't any encoding that correctly fits to all possible
filenames.

We only auto-convert Windows file names from UTF-16 to anything elseif you use non-unicodestring/widestring variables with the file nameAPIs. If you consistently use unicodestring/widestring, no conversionwill happen (except with not yet converted APIs, such as classes).

A lot of APIs use buffers. You can try to assign an encoding to a buffer,
but if you use that encoding
to auto-convert anything you made a blatant mistake. Assuming that anything
from the outside world
(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...

Maybe we should add support for "WTF-8" like in Rust:https://github.com/rust-lang/rust/issues/12056

4) some Barcodes,

I would not consider these to be strings, but other than that the sameholds as for String Buffers above.

5) Various File-Format-Standards,


Idem.

6) anything that uses ASCII + some Control-Bytes for communication,


Idem.

7) some encodings used in databases, ...
all that won't fit into the FCP scheme of 'known encodings'..
 

The most obvious showstoppers for FPC 3.0 are:
FPC 3.0 doesn't have a useful type for string-buffers.

Use arrays, like in any other programming language. If you insist onusing strings, simply stick to consistently using a single string type.

FPC 3.0 doesn't have a useful type for Filenames

Use UnicodeString: as long as you do not assign it to another stringtype, it won't get converted.

FPC 3.0 adds unsafe auto-conversions


Where/when?


Jonas
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

Reply via email to