Lawrence D'Oliveiro wrote:
In message <mailman.1533.1286774527.29448.python-l...@python.org>, Ethan Furman wrote:


Lawrence D'Oliveiro wrote:


In message <mailman.1466.1286556950.29448.python-l...@python.org>, Ethan
Furman wrote:


Lawrence D'Oliveiro wrote:


But they can only recognize it as a BOM if they assume UTF-8 encoding to
begin with. Otherwise it could be interpreted as some other coding.

Not so.  The first three bytes are the flag.

But this is just a text file. All parts of its contents are text, there
is no “flag”.

If you think otherwise, then tell us what are these three “flag” bytes
for a Windows-1252-encoded text file?

MS treats those first three bytes as a flag -- if they equal the BOM, MS
treats it as UTF-8, if they equal anything else, MS does not treat it as
UTF-8.


So what does it treat it as? You previously gave examples of flag values for dBase III. What are the flag values for Windows-1252, versus, say, ISO-8859-15?

I am not aware of any other flag values for text files besides the BOM for UTF-8. If the BOM is not there, I imagine MS defaults to whatever the locale for that machine is, but I do not know for sure.

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to