Andreas Dorn wrote on Wed, 11 May 2016:
All in all Graeme is right. FPC looks pretty much broken to me, too. For my projects I pulled the emergency-break on anything FPC. The most serious flaws for me of FPC 3.0 are: - assuming that it's possible to assign an encoding to every string - using an (unsafe) guess about the encoding for auto-conversions
Do you have code that works correctly in FPC 2.6.x, but not in FPC 3.0? If so, can you please post it or file bug reports? Again: the main focus when designing all of this new functionality was backward compatibility: existing code that uses plain string/shortstring/ansistring/unicodestring/char/widechar/unicodechar/pchar/pwidechar/punicodechar should have the same behaviour in FPC 3.0 as in previous FPC versions if you don't make any changes. And in virtually all cases it does (the utf8string type being a notable exception).
Some examples: 1) String-Buffers Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an encoding to those chunks, and allowing auto-conversions will just lead to corruption. Where is the string-type for string-buffers gone?
There never was any, but as long as you don't try to convert strings containing such arbitrary data from one code page to another (by either calling setcodepage() or by assigning them from a string with declared code page X to a string with declared code page Y), no conversions will happen.
2) Most programming languages out there use something like "sequence of UTF-16 codepoints" as a string-type. (That's not the same as UTF-16 string !!!!!) It's a proper string type for "UTF-16 buffer" - pretty much nobody out there uses a low-level string-type that assumes that the content is a complete UTF-16 string.
The meaning of UnicodeString has not changed in FPC 3.0 compared to previous FPC versions, nor the way they are converted to/from other string types. You can argue it was broken from the start, but that's unrelated to the present animosity that's getting vented about FPC 3.0.
3) Filenames on Windows You can't convert any random filename on Windows to UTF8 and back without dataloss. There simply isn't any encoding that correctly fits to all possible filenames.
We only auto-convert Windows file names from UTF-16 to anything else if you use non-unicodestring/widestring variables with the file name APIs. If you consistently use unicodestring/widestring, no conversion will happen (except with not yet converted APIs, such as classes).
A lot of APIs use buffers. You can try to assign an encoding to a buffer, but if you use that encoding to auto-convert anything you made a blatant mistake. Assuming that anything from the outside world (WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...
Maybe we should add support for "WTF-8" like in Rust: https://github.com/rust-lang/rust/issues/12056
4) some Barcodes,
I would not consider these to be strings, but other than that the same holds as for String Buffers above.
5) Various File-Format-Standards,
Idem.
6) anything that uses ASCII + some Control-Bytes for communication,
Idem.
7) some encodings used in databases, ... all that won't fit into the FCP scheme of 'known encodings'.. The most obvious showstoppers for FPC 3.0 are: FPC 3.0 doesn't have a useful type for string-buffers.
Use arrays, like in any other programming language. If you insist on using strings, simply stick to consistently using a single string type.
FPC 3.0 doesn't have a useful type for Filenames
Use UnicodeString: as long as you do not assign it to another string type, it won't get converted.
FPC 3.0 adds unsafe auto-conversions
Where/when? Jonas _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal