Adam Olsen added the comment: The problem with "being tolerate" as you suggest is you lose the ability to round-trip. Read in a file using the UTF-8 signature, write it back out, and suddenly nothing else can open it.
Conceptually, these signatures shouldn't even be part of the encoding; they're a prefix in the file indicating which encoding to use. Note that the BOM signature (ZWNBSP) is a valid code point. Although it seems unlikely for a file to start with ZWNBSP, if were to chop a file up into smaller chunks and decode them individually you'd be more likely to run into it. (However, it seems general use of ZWNBSP is being discouraged precisely due to this potential for confusion[1]). In summary, guessing the encoding should never be the default. Although it may be appropriate in some contexts, we must ensure we emit the right encoding for those contexts as well. [2] [1] http://unicode.org/faq/utf_bom.html#38 [2] http://unicode.org/faq/utf_bom.html#28 ---------- nosy: +rhamphoryncus __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1328> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com