Hi,

Yes, I was talking about the byte order mark (BOM):

http://www.unicode.org/faq/utf_bom.html

It seems that BOM is a Unicode UTF facility that MS thought was a great thing to implement, and I certainly agree with that assessment. BOM tells even more than its name implies. A program can detect if a file is encoded in UTF-8, 16LE, 16BE, 32LE and 32BE in a very easy way.

I think that it would be good for gcc (or cpp) to support this because it would make for better interoperability with Visual C++, and it would allow each file to indicate how it is encoded without having to rely on some setting that may or may not provide the correct information in every case.



Nicolas





Ranjit Mathew wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike Hearn wrote:
On Mon, 24 Apr 2006 15:27:07 -0400, Nicolas De Rico wrote:
I would like to compile files created on Windows and encoded in "Unicode" (UTF-8 or UTF-16). Microsoft puts a little header at the beginning of files to indicate that they are UTF-16, UTF-8, etc. I believe that this header is standard unicode btw, not an extension!
Are you thinking of the byte order mark (BOM)? If so then this is a quirk
of UTF-16 and is a Windows thing that many apps can't handle correctly ...
UTF-8 should not have any headers at all and GCC should handle them fine.
Try using some text editor to check it really is UTF-8.

Windows Notepad still inserts a BOM (0xEF 0xBB 0xBF) at
the beginning of files encoded with UTF-8. See:

  http://www.microsoft.com/globaldev/getwr/steps/wrg_unicode.mspx
  http://en.wikipedia.org/wiki/Byte_Order_Mark

Ranjit.

- --
Ranjit Mathew      Email: rmathew AT gmail DOT com

Bangalore, INDIA.    Web: http://rmathew.com/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFETfs0Yb1hx2wRS48RAkvmAKCae/o9vD3doaDKD1VPOSUlSlhRjACdGqv0
nD0cMiSvZLu9TfmIf/BUuIU=
=lZaM
-----END PGP SIGNATURE-----


Reply via email to