STINNER Victor <victor.stin...@haypocalc.com> added the comment: >> That won't work, Victor, since it makes invalid encoding >> names valid, e.g. 'utf(=)-8'.
> .. but this *is* valid: ... Ah yes, it's because of encodings.normalize_encoding(). It's funny: we have 3 functions to normalize an encoding name, and each function does something else :-) E.g. encodings.normalize_encoding() doesn't replace non-ASCII letters, and don't convert to lowercase. more_aggressive_normalization.patch changes all of the 3 normalization functions and add tests on encodings.normalize_encoding(). I think that speed and backward compatibility is more important than conforming to IANA or other standards. Even if "~~ utf#8 ~~" is ugly, I don't think that it really matter that we accept it. -- If you don't want to touch the normalization functions and just add more aliases in C fast-paths: we should also add utf8, utf16 and utf32. Use of "utf8" in Python: random.Random.seed(), smtpd.SMTPChannel.collect_incoming_data(), tarfile, multiprocessing.connection (xml serialization) PS: On error, UTF-8 decoder raises a UnicodeDecodeError with "utf8" as the encoding name :-) ---------- Added file: http://bugs.python.org/file20880/more_aggressive_normalization.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11303> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com