[issue1328] feature request: force BOM option

Adam Olsen Thu, 01 Nov 2007 11:07:43 -0800

Adam Olsen added the comment:

The problem with "being tolerate" as you suggest is you lose the ability
to round-trip.  Read in a file using the UTF-8 signature, write it back
out, and suddenly nothing else can open it.


Conceptually, these signatures shouldn't even be part of the encoding;
they're a prefix in the file indicating which encoding to use.

Note that the BOM signature (ZWNBSP) is a valid code point.  Although it
seems unlikely for a file to start with ZWNBSP, if were to chop a file
up into smaller chunks and decode them individually you'd be more likely
to run into it.  (However, it seems general use of ZWNBSP is being
discouraged precisely due to this potential for confusion[1]).

In summary, guessing the encoding should never be the default.  Although
it may be appropriate in some contexts, we must ensure we emit the right
encoding for those contexts as well. [2]

[1] http://unicode.org/faq/utf_bom.html#38
[2] http://unicode.org/faq/utf_bom.html#28

----------
nosy: +rhamphoryncus

__________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1328>
__________________________________
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1328] feature request: force BOM option

Reply via email to