On Mon, Aug 14, 2017 at 6:53 AM, Tony Whyman via Lazarus <lazarus@lists.lazarus-ide.org> wrote: > > On 13/08/17 12:18, Juha Manninen via Lazarus wrote: >> >> Unicode was designed to solve exactly the problems caused by locale >> differences. >> Why don't you use it? > > I believe you effectively answer your own question in your preceding post: > >> Actually using the Windows system codepage is not safe any more. >> The current Unicode system in Lazarus maps AnsiString to use UTF-8. >> Text with Windows codepage must be converted explicitly. >> This is a breaking change compared to the old Unicode suppport in >> Lazarus 1.4.x + FPC 2.6.x. > > If you are processing strings as "text" then you probably do not care how it > is encoded and can live with "breaking changes". However, if, for some > reason you are or need to be aware of how the text is encoded - or are using > string types as a useful container for binary data then, types that sneak up > on you with implicit type conversions or which have semantics that change > between compilers or versions, are just another source of bugs. > > PChar used to be a safe means to access binary data - but not anymore, > especially if you move between FPC and Delphi. (One of my gripes is that the > FCL still makes too much use of PChar instead of PByte with the resulting > Delphi incompatibility). The "string" type also used to be a safe container > for any sort of binary data, but when its definition can change between > compilers and versions, it is now something to be avoided. > > As a general rule, I now always use PByte for any sort of string that is > binary, untyped or encoding to be determined. It works across compilers (FPC > and Delphi) with consistent semantics and is safe for such use. > > I also really like AnsiString from FCP 3.0 onwards. By making the encoding a > dynamic attribute of the type, it means that I know what is in the container > and can keep control. > > I am sorry, but I would only even consider using Unicodestrings as a type > (or the default string type) when I am just processing text for which the > encoding is a don't care, such as a window caption, or for intensive text > analysis. If I am reading/writing text from a file or database where the > encoding is often implicit and may vary from the Unicode standard then my > preference is for AnsiString. I can then read the text (e.g. from the file) > into a (RawByteString) buffer, set the encoding and then process it safely > while often avoiding the overhead from any transliteration. PByte comes into > its own when the file contains a mixture of binary data and text. > > Text files and databases tend to use UTF-8 or are encoded using legacy > Windows Code pages. The Chinese also have GB18030. With a database, the > encoding is usually known and AnsiString is a good way to read/write data > and to convey the encoding, especially as databases usually use a variable > length multi-byte encoding natively and not UTF-16/Unicode. With files, the > text encoding is usually implicit and AnsiString is ideal for this as it > lets you read in the text and then assign the (implicit) encoding to the > string, or ensure the correct encoding when writing.
Unicode everywhere and you using AnsiString and doing everything... Now I'm confused. > And anyway, I do most of my work in Linux, so why would I even want to > bother myself with arrays of widechars when the default environment is UTF8? Maybe you do not have problems because you don't use Windows. > We do need some stability and consistency in strings which, as someone else > noted have been confused by Embarcadero. I would like to see that focused on > AnsiString with UnicodeString being only for specialist use on Windows or > when intensive text analysis makes a two byte encoding more efficient than a > variable length multi-byte encoding. FPC and Lazarus claim they are cross-platform — this is a fact — and because that, IMHO, both should be use in only one way in every system, don't you think? Best regards, Marcos Douglas -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus