Re: Windows Unicode and GCC

Nicolas De Rico Tue, 25 Apr 2006 07:46:49 -0700

Hi,

Yes, I was talking about the byte order mark (BOM):


http://www.unicode.org/faq/utf_bom.html

It seems that BOM is a Unicode UTF facility that MS thought was a greatthing to implement, and I certainly agree with that assessment. BOMtells even more than its name implies. A program can detect if a fileis encoded in UTF-8, 16LE, 16BE, 32LE and 32BE in a very easy way.

I think that it would be good for gcc (or cpp) to support this becauseit would make for better interoperability with Visual C++, and it wouldallow each file to indicate how it is encoded without having to rely onsome setting that may or may not provide the correct information inevery case.




Nicolas





Ranjit Mathew wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike Hearn wrote:

On Mon, 24 Apr 2006 15:27:07 -0400, Nicolas De Rico wrote:
I would like to compile files created on Windows and encoded in"Unicode" (UTF-8 or UTF-16). Microsoft puts a little header at thebeginning of files to indicate that they are UTF-16, UTF-8, etc. Ibelieve that this header is standard unicode btw, not an extension!
Are you thinking of the byte order mark (BOM)? If so then this is a quirk
of UTF-16 and is a Windows thing that many apps can't handle correctly ...
UTF-8 should not have any headers at all and GCC should handle them fine.
Try using some text editor to check it really is UTF-8.


Windows Notepad still inserts a BOM (0xEF 0xBB 0xBF) at
the beginning of files encoded with UTF-8. See:

  http://www.microsoft.com/globaldev/getwr/steps/wrg_unicode.mspx
  http://en.wikipedia.org/wiki/Byte_Order_Mark

Ranjit.

- --
Ranjit Mathew      Email: rmathew AT gmail DOT com

Bangalore, INDIA.    Web: http://rmathew.com/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFETfs0Yb1hx2wRS48RAkvmAKCae/o9vD3doaDKD1VPOSUlSlhRjACdGqv0
nD0cMiSvZLu9TfmIf/BUuIU=
=lZaM
-----END PGP SIGNATURE-----

Re: Windows Unicode and GCC

Reply via email to