Andreas Dorn wrote on Wed, 11 May 2016:

All in all Graeme is right. FPC looks pretty much broken to me, too.
For my projects I pulled the emergency-break on anything FPC.
 
The most serious flaws for me of FPC 3.0 are:
- assuming that it's possible to assign an encoding to every string
- using an (unsafe) guess about the encoding for auto-conversions

Do you have code that works correctly in FPC 2.6.x, but not in FPC 3.0? If so, can you please post it or file bug reports? Again: the main focus when designing all of this new functionality was backward compatibility: existing code that uses plain string/shortstring/ansistring/unicodestring/char/widechar/unicodechar/pchar/pwidechar/punicodechar should have the same behaviour in FPC 3.0 as in previous FPC versions if you don't make any changes. And in virtually all cases it does (the utf8string type being a notable exception).

Some examples:
1) String-Buffers
Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an
encoding to
those chunks, and allowing auto-conversions will just lead to corruption.
 
Where is the string-type for string-buffers gone?

There never was any, but as long as you don't try to convert strings containing such arbitrary data from one code page to another (by either calling setcodepage() or by assigning them from a string with declared code page X to a string with declared code page Y), no conversions will happen.

2) Most programming languages out there use something like "sequence of
UTF-16 codepoints" as a string-type.
(That's not the same as UTF-16 string !!!!!)
It's a proper string type for "UTF-16 buffer" - pretty much nobody out
there uses a low-level string-type that assumes
that the content is a complete UTF-16 string.

The meaning of UnicodeString has not changed in FPC 3.0 compared to previous FPC versions, nor the way they are converted to/from other string types. You can argue it was broken from the start, but that's unrelated to the present animosity that's getting vented about FPC 3.0.

 3) Filenames on Windows
You can't convert any random filename on Windows to UTF8 and back without
dataloss.
There simply isn't any encoding that correctly fits to all possible
filenames.

We only auto-convert Windows file names from UTF-16 to anything else if you use non-unicodestring/widestring variables with the file name APIs. If you consistently use unicodestring/widestring, no conversion will happen (except with not yet converted APIs, such as classes).

A lot of APIs use buffers. You can try to assign an encoding to a buffer,
but if you use that encoding
to auto-convert anything you made a blatant mistake. Assuming that anything
from the outside world
(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...

Maybe we should add support for "WTF-8" like in Rust: https://github.com/rust-lang/rust/issues/12056

4) some Barcodes,

I would not consider these to be strings, but other than that the same holds as for String Buffers above.

5) Various File-Format-Standards,

Idem.

6) anything that uses ASCII + some Control-Bytes for communication,

Idem.

7) some encodings used in databases, ...
all that won't fit into the FCP scheme of 'known encodings'..
 

The most obvious showstoppers for FPC 3.0 are:
FPC 3.0 doesn't have a useful type for string-buffers.

Use arrays, like in any other programming language. If you insist on using strings, simply stick to consistently using a single string type.

FPC 3.0 doesn't have a useful type for Filenames

Use UnicodeString: as long as you do not assign it to another string type, it won't get converted.

FPC 3.0 adds unsafe auto-conversions

Where/when?


Jonas
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to