Re: Windows Unicode and GCC

Nicolas De Rico Tue, 25 Apr 2006 09:08:55 -0700

Hello and thank you for the reply.

I created 3 files (very simple hello world program):


hi.c: UTF-8 without BOM
hi-8.c: UTF-8 with BOM
hi-16.c: UTF-16 with BOM

I ran iconv twice for each file. Once with the -f option whichexplicitly indicates the encoding, and once without the -f option to seeif libiconv is able to detect the encoding from the BOM. In all cases Itold iconv to create a UTF-8 file and I used od (octodump) to inspectthe resulting file.


My results:
1: without -f option
2: with -f option

hi.c (1):    UTF-8, without BOM
hi.c (2):    UTF-8, without BOM
hi-8.c (1):  UTF-8, with BOM *
hi-8.c (2):  UTF-8, with BOM *
hi-16.c (1): illegal character error. Does not use BOM automatically!
hi-16.c (2): UTF-8, without BOM

Considering those results, it looks a bit like I'll have to bug thelibiconv crew!


Presumably, cpp wants everything from libiconv in UTF-8 with no BOM.


Nick

* Did libiconv really consider the BOM or did it just copy the file???I have to investigate. libiconv may just not support the BOM at all!





Eric Christopher wrote:

It seems that BOM is a Unicode UTF facility that MS thought was agreat thing to implement, and I certainly agree with that assessment.BOM tells even more than its name implies. A program can detect if afile is encoded in UTF-8, 16LE, 16BE, 32LE and 32BE in a very easy way.
I think that it would be good for gcc (or cpp) to support this becauseit would make for better interoperability with Visual C++, and itwould allow each file to indicate how it is encoded without having torely on some setting that may or may not provide the correctinformation in every case.
cpp relies on libiconv for almost all of it's translation support. Trypreprocessing a file with iconv and see if you can compile itafterwards. If you can, then it's a gcc bug, otherwise you'll need tobug the libiconv folks about implementing support.
-eric

Re: Windows Unicode and GCC

Reply via email to