Am 07.05.2017 um 11:57 schrieb Graeme Geldenhuys via Lazarus: > On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote: >>> Yeah, that would be the logical thing to do. >> >> Why? What makes a string literal UTF-8? >> > > As Mattias said, the fact that the source unit is UTF-8 encoded. > Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source > unit is UTF-8 encoded, the literal string constant can't (and > shouldn't) be in any other encoding. > > I would say the same if the source unit was stored in UTF-16 > encoding. Then string literals would be treated as UTF-16.
And if a ISO/Ansi codepage is given? Things would probably fail. The point is: FPC is consistent in this regard: also sources with a given iso/ansi codepage are handled the same way. If there is a string literal with non-ascii chars, it is converted to UTF-16 using the codepage of the source. Very simple, very logical. It is a matter of preference if UTF-8, -16, -32 are chosen at this point, but FPC uses UTF-16. If it uses UTF-8, the problem would occur the other way around. If no codepage is given (by directive, command line, BOM), string literals are handled byte-wise as raw strings. > > It's perfectly logical to me. It is logical only in a limited view. -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus