Helge Hafting wrote:
Angus Leeming wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
Can you try to change the utf-8 to ucs-4 conversion to use either
"UCS-4BE" or "UCS-4LE", instead of "UCS-4"? Also the conversion the
other way ucs-4 -> ucs-2, try with UCS-2BE and UCS-2LE.

And with the attached patch where I have put LE everywhere, the text is displayed correctly but the inset buttons are not. I guess I have gone too far... we need a conbination of the second patch (unicode_little_endian) and this one (unicode_little_endian_full).

Intel-based PCs use Little Endian byte order. Often Windows file formats use the BOM (http://en.wikipedia.org/wiki/Byte_Order_Mark) to make it trivial for the executable decoding the UTF-8 file format to decide whether the file was stored in Big Endian or Little Endian format. The BOM tends not to be used on Unix machines though as it messes with the sh-bang mechanism and isn't actually needed at all anyway...

The fact that you need to tell LyX to convert your files from UTF-8 to the Little Endian flavour of UCS-4 etc suggests that your UTF-8 files are encoded in Big Endian format. What happens if you run such a file through iconv, converting explicitly to UTF-8LE? Do the two files compare identical or are they indeed changed?
I thought UTF-8 didn't care about endianness, being a single-byte
encoding?

You have a point here. Most (all) of the characters used in the English doc are single byte. So even if my file are coded in utf8-BE it shouldn't matter anyway. But FYI, utf8 characters can be multiple bytes.

 A conversion to UCS-4 can go wrong if he convert
to the wrong UCS-4 format, but utf-8 is supposed to be
the same no matter what endianness the machine uses?

Helge Hafting


Reply via email to