Re: lyx2lyx bug? (was Re: Track change status Michael? (Re: [PATCH] small setBuffer cleanup)

Abdelrazak Younes Mon, 21 Aug 2006 07:58:39 -0700

Helge Hafting wrote:

Angus Leeming wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
Can you try to change the utf-8 to ucs-4 conversion to use either
"UCS-4BE" or "UCS-4LE", instead of "UCS-4"? Also the conversion the
other way ucs-4 -> ucs-2, try with UCS-2BE and UCS-2LE.
And with the attached patch where I have put LE everywhere, the textis displayed correctly but the inset buttons are not. I guess I havegone too far... we need a conbination of the second patch(unicode_little_endian) and this one (unicode_little_endian_full).
Intel-based PCs use Little Endian byte order. Often Windows fileformats use the BOM (http://en.wikipedia.org/wiki/Byte_Order_Mark) tomake it trivial for the executable decoding the UTF-8 file format todecide whether the file was stored in Big Endian or Little Endianformat. The BOM tends not to be used on Unix machines though as itmesses with the sh-bang mechanism and isn't actually needed at allanyway...
The fact that you need to tell LyX to convert your files from UTF-8 tothe Little Endian flavour of UCS-4 etc suggests that your UTF-8 filesare encoded in Big Endian format. What happens if you run such a filethrough iconv, converting explicitly to UTF-8LE? Do the two filescompare identical or are they indeed changed?
I thought UTF-8 didn't care about endianness, being a single-byte
encoding?

You have a point here. Most (all) of the characters used in the Englishdoc are single byte. So even if my file are coded in utf8-BE itshouldn't matter anyway. But FYI, utf8 characters can be multiple bytes.

 A conversion to UCS-4 can go wrong if he convert
to the wrong UCS-4 format, but utf-8 is supposed to be
the same no matter what endianness the machine uses?

Helge Hafting

Re: lyx2lyx bug? (was Re: Track change status Michael? (Re: [PATCH] small setBuffer cleanup)

Reply via email to