On 12/1/24 9:02 AM, Adriaan van Os via fpc-pascal wrote:
Hairy Pixels via fpc-pascal wrote:
ChatGPT is saying I can print unicode scalars like that but i don’t
see it works and no compiler warnings even. Did it make this up or
did I do something wrong?
Writeln('Unicode scalar 1F496: ', #$1F496); // 💖
Writeln('Unicode scalar 1F496: ', WideChar($1F496)); // 💖
What people call "Unicode", even compiler manuals, is not "Unicode". I
repeat it again and again, it is not "Unicode" but so-called
"Unicode". Microsoft, and those who want to be compatible with it, use
UTF-16 <https://en.wikipedia.org/wiki/UTF-16> treated as if it were
UCS-2 <https://en.wikipedia.org/wiki/Universal_Coded_Character_Set>
They call that "Unicode", which is plain nonsense. In the real world,
one can not stuff 21-bits into 16-bits.
For heaven's sake, let's stop talking about so-called "Unicode" and
instead use UTF-8 <https://en.wikipedia.org/wiki/UTf-8> or UTF-32
<https://en.wikipedia.org/wiki/UCS-4>.
Here's how Free Pascal types map to Unicode terminology:
WideChar = UTF-16 code unit
UnicodeString = UTF-16 encoded string
WideString = UTF-16 encoded string. On Windows it's not reference
counted - used for COM compatibility. On other platforms, it's the same
as UnicodeString.
UTF8String = UTF-8 encoded string. Defined as UTF8String=type
AnsiString(CP_UTF8).
UTF16String = alias for UnicodeString
Hope this clears things up.
Another thing:
For conversions between different encodings to work (e.g. between UTF-8
and UTF-16), you need to install a widestring manager. Some platforms
(like Windows) always include one by default, but other platforms (e.g.
Linux) don't, in order to reduce bloat, for programs that don't need it.
For these, you may need to include unit cwstring or something like that.
Nikolay
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal