Hi,
Yes, I was talking about the byte order mark (BOM):
http://www.unicode.org/faq/utf_bom.html
It seems that BOM is a Unicode UTF facility that MS thought was a great
thing to implement, and I certainly agree with that assessment. BOM
tells even more than its name implies. A program can detect if a file
is encoded in UTF-8, 16LE, 16BE, 32LE and 32BE in a very easy way.
I think that it would be good for gcc (or cpp) to support this because
it would make for better interoperability with Visual C++, and it would
allow each file to indicate how it is encoded without having to rely on
some setting that may or may not provide the correct information in
every case.
Nicolas
Ranjit Mathew wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Mike Hearn wrote:
On Mon, 24 Apr 2006 15:27:07 -0400, Nicolas De Rico wrote:
I would like to compile files created on Windows and encoded in
"Unicode" (UTF-8 or UTF-16). Microsoft puts a little header at the
beginning of files to indicate that they are UTF-16, UTF-8, etc. I
believe that this header is standard unicode btw, not an extension!
Are you thinking of the byte order mark (BOM)? If so then this is a quirk
of UTF-16 and is a Windows thing that many apps can't handle correctly ...
UTF-8 should not have any headers at all and GCC should handle them fine.
Try using some text editor to check it really is UTF-8.
Windows Notepad still inserts a BOM (0xEF 0xBB 0xBF) at
the beginning of files encoded with UTF-8. See:
http://www.microsoft.com/globaldev/getwr/steps/wrg_unicode.mspx
http://en.wikipedia.org/wiki/Byte_Order_Mark
Ranjit.
- --
Ranjit Mathew Email: rmathew AT gmail DOT com
Bangalore, INDIA. Web: http://rmathew.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFETfs0Yb1hx2wRS48RAkvmAKCae/o9vD3doaDKD1VPOSUlSlhRjACdGqv0
nD0cMiSvZLu9TfmIf/BUuIU=
=lZaM
-----END PGP SIGNATURE-----