On 12/1/24 9:02 AM, Adriaan van Os via fpc-pascal wrote:
Hairy Pixels via fpc-pascal wrote:
ChatGPT is saying I can  print unicode scalars like that but i don’t see it works and no compiler warnings even. Did it make this up or did I do something wrong?

  Writeln('Unicode scalar 1F496: ', #$1F496); // 💖
  Writeln('Unicode scalar 1F496: ', WideChar($1F496));  // 💖

What people call "Unicode", even compiler manuals, is not "Unicode". I repeat it again and again, it is not "Unicode" but so-called "Unicode". Microsoft, and those who want to be compatible with it, use UTF-16 <https://en.wikipedia.org/wiki/UTF-16> treated as if it were UCS-2 <https://en.wikipedia.org/wiki/Universal_Coded_Character_Set>

They call that "Unicode", which is plain nonsense. In the real world, one can not stuff 21-bits into 16-bits.

For heaven's sake, let's stop talking about so-called "Unicode" and instead use UTF-8 <https://en.wikipedia.org/wiki/UTf-8> or UTF-32 <https://en.wikipedia.org/wiki/UCS-4>.

Here's how Free Pascal types map to Unicode terminology:

WideChar = UTF-16 code unit

UnicodeString = UTF-16 encoded string

WideString = UTF-16 encoded string. On Windows it's not reference counted - used for COM compatibility. On other platforms, it's the same as UnicodeString.

UTF8String = UTF-8 encoded string. Defined as UTF8String=type AnsiString(CP_UTF8).

UTF16String = alias for UnicodeString

Hope this clears things up.


Another thing:

For conversions between different encodings to work (e.g. between UTF-8 and UTF-16), you need to install a widestring manager. Some platforms (like Windows) always include one by default, but other platforms (e.g. Linux) don't, in order to reduce bloat, for programs that don't need it. For these, you may need to include unit cwstring or something like that.


Nikolay
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to