Lazarus assumes that an ansistring contains always utf-8. This is not
generally true.
While this might be true, I think it's a consequence of a shortcoming of FPC, which simply identifies the types ANSIString and UTF8String. IMHO (in a future version) it should take care of the encoding of string types.

I suggest that there should be the native string types ANSIString, UFT8String, UCS2String and UTF16String together with the appropriate character types ANSIChar, UFT8Char, UCS2Char and UTF16Char (UTF8Char and UTF16Char in fact being the appropriate strings. The compiler and RTL should take care of any conversion between those (and do the appropriate constant assignment when needed).

Now there should be compiler options to have the use select which type he wants to use for the generic "String" type (all four applicable) and which he want to use for the generic WideString Type (UCS2String and UTF16String applicable). The generic char and WideChar type is assigned appropriately.

Moreover this version should for all native string types provide as well Unicode-character("code point")-counted as submode("code unit")-counted functions and procedures for what we know as s[i], length(s), pos(), copy(), delete(), ...

There should be compiler options (for all native string types) to have the user select which of the two he wants use for the generic s[i], length(s), pos(), copy(), delete(), ... notation.

With this provided, Lazarus would be able to provide whichever API for LCL they want in a decent and highly compatible way (They _should_ allow the user to select if he wants to link in an ANSIString, UTF8String or UCS2String version).

Moreover this would allow for tuning a project for space of for speed according to the platform we want to compile it for (e.g 32 Bit PC or ARM based Cellphone).

-Michael
_______________________________________________
fpc-devel maillist  -  [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to