Michael Van Canneyt wrote:

XML is not in UTF-8.

What do you mean by that? I don't understand. If you delete the <?xml
... ?> line and save in a text file the "file" command says: UTF-8
Unicode text, if opened with web browser (e.g. Firefox) encoding is
also shown as UTF-8. Opening the file with mcview in hex mode I saw
latin characters are encoded with 1 byte and cyrillic with 2. I
manually checked the values of these bytes against a reference table.
This is also the result returned from (Ch and $1F) shl 6 + (Ch2 and
$3F) in function InternalGetChar. After assigning the value to
FCurChar ? appear.

       FCurChar := WideChar((Ch and $1F) shl 6 + (Ch2 and $3F));


The XML unit is now XML 1.1 compliant. It performs conversion by itself,
part of this change was indeed performed during the 2.0.4 release.

 { supported encodings }
 TEncoding = (enUnknown, enUTF8, enUTF16BE, enUTF16LE);

Does CP1251 work?


My guess is that you simply don't need to do any conversion yourself.

I was doing UTF-8 to CP1251 conversion, but no the input I expected to
be UTF-8 doesn't look nice to me. :)


I am going to test tomorrow with different text encodings and see what
will happen.
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to