On Fri, Oct 21, 2016 at 5:08 PM, Jürgen Hestermann via Lazarus <lazarus@lists.lazarus-ide.org> wrote: > And again we are at the point where you need to understand what goes on > under the hood... ;-)
Yes but that is true with any programming. I am truly happy that we have Unicode instead of the old system codepages. I remember text full of question marks earlier a lot but not any more. Things are getting better... I don't even know how the codepages worked when one text had many languages. I don't even care now because we have Unicode. :) On Fri, Oct 21, 2016 at 5:15 PM, Jürgen Hestermann via Lazarus <lazarus@lists.lazarus-ide.org> wrote: > The problem is, that Unicode has a code point for "á" but > also allows to compose this characters by having an "a" > and an "´" printed over each over. > I will never understand why this was allowed because > I thought that Unicode was intruduced to overcome such > issues by defining a huge number of code points directly. > > Nevertheless, if you have such a situation then you cannot > search for a byte sequence as there are 2 possible representations > of the same character. That is all true although Gabor's problem was not caused by it. His LCL app used the default UTF-8 strings but the console program used Windows codepage. Adding to the confusion, Windows console codepage is different from its system codepage (if I have understood right). This is another reason to use the default UTF-8 system, it handles it all behind the scenes. > I have given up on taking care about such composed characters > and assume that all Unicode strings are normalized. I have understood the composed version (many codepoints / character) is the recommended normalized one. We must support it properly in future. The combining rules are extremely complex. Benjamin Rosseaux (BeRo in forum) has code for it. There was some other code, too. I must dive into it sometime in future. In fact we have simple code for combined accented characters in LazUnicode unit, despite of what I wrote earlier in this thread. It was basically copied from SynEdit. I will write another post... Juha -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus