Re: Eclipse/PyDev - BOM Lexical Error

Ethan Furman Fri, 08 Oct 2010 09:59:55 -0700

Lawrence D'Oliveiro wrote:

In message <[email protected]>, Diez B. Roggisch wrote:

Lawrence D'Oliveiro <[email protected]_zealand> writes:

In message <[email protected]>, Diez B. Roggisch wrote:

Lawrence D'Oliveiro <[email protected]_zealand> writes:

What exactly is the point of a BOM in a UTF-8-encoded file?

It's a marker like the "coding: utf-8" in python-files. It tells the
software aware of it that the content is UTF-8.

But if the software is aware of it, then why does it need to be told?

Let me rephrase: windows editors such as notepad recognize the BOM, and
then assume (hopefully rightfully so) that the rest of the file is text
in utf-8 encoding.

But they can only recognize it as a BOM if they assume UTF-8 encoding tobegin with. Otherwise it could be interpreted as some other coding.

Not so. The first three bytes are the flag. For example, in a .dbffile, the first byte determines what type of dbf the file is: \x03 =dBase III, \x83 = dBase III with memos, etc. More checking shouldnaturally be done to ensure the rest of the fields make sense for thedbf type specified.

MS decided that if the first three bytes = \xEF \xBB \xBF then it's aUTF-8 file, and if it is not, don't open it with an MS product.Likewise, MS will add those bytes to any UTF-8 file it saves.

Naturally, this causes problems for non-MS usages, but anybody who's hadto work with both MS and non-MS platforms/products/methodologies knowsthat MS does not play well with others.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list

Re: Eclipse/PyDev - BOM Lexical Error

Reply via email to