Michael Van Canneyt wrote:
XML is not in UTF-8.
What do you mean by that? I don't understand. If you delete the <?xml ... ?> line and save in a text file the "file" command says: UTF-8 Unicode text, if opened with web browser (e.g. Firefox) encoding is also shown as UTF-8. Opening the file with mcview in hex mode I saw latin characters are encoded with 1 byte and cyrillic with 2. I manually checked the values of these bytes against a reference table. This is also the result returned from (Ch and $1F) shl 6 + (Ch2 and $3F) in function InternalGetChar. After assigning the value to FCurChar ? appear. FCurChar := WideChar((Ch and $1F) shl 6 + (Ch2 and $3F));
The XML unit is now XML 1.1 compliant. It performs conversion by itself, part of this change was indeed performed during the 2.0.4 release.
{ supported encodings } TEncoding = (enUnknown, enUTF8, enUTF16BE, enUTF16LE); Does CP1251 work?
My guess is that you simply don't need to do any conversion yourself.
I was doing UTF-8 to CP1251 conversion, but no the input I expected to be UTF-8 doesn't look nice to me. :) I am going to test tomorrow with different text encodings and see what will happen. _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal