New submission from Ezio Melotti <ezio.melo...@gmail.com>: I noticed that codec names[1]: 1) can contain random/unnecessary spaces and punctuation; 2) have several aliases that could probably be removed;
A few examples of valid codec names (done with Python 3): >>> s = 'xxx' >>> s.encode('utf') b'xxx' >>> s.encode('utf-') b'xxx' >>> s.encode('}Utf~->8<-~siG{ ;)') b'\xef\xbb\xbfxxx' 'utf' is an alias for UTF-8 and that doesn't quite make sense to me that 'utf' alone refers to UTF-8. 'utf-' could be a mistyped 'utf-8', 'utf-7' or even 'utf-16'; I'd like it to raise an error instead. The third example is not probably something that can be found in the real world (I hope) but it shows how permissive the parsing of the names is. Apparently the whitespaces are removed and the punctuation is used to split the name in several parts and then the check is performed. About the aliases: in the documentation the "official" name for the UTF-8 codec is 'utf_8' and there are 3 more aliases: U8, UTF, utf8. For ISO-8859-1, the "official" name is 'latin_1' and there are 7 more aliases: iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1. The Zen says "There should be one—and preferably only one—obvious way to do it.", so I suggest to 1) disallow random punctuation and spaces within the name (only allow leading and trailing spaces); 2) change the default names to, for example: 'utf-8', 'iso-8859-1' instead of 'utf_8' and 'iso8859_1'. The name are case-insentive. 3) remove the unnecessary aliases, for example: 'UTF', 'U8' for UTF-8 and 'iso8859-1', '8859', 'latin', 'L1' for ISO-8859-1; This last point could break some code and may need some DeprecationWarning. If there are good reason to keep around these codecs only the other two issues can be addressed. If the name of the codec has to be a valid variable name (that is, without '-'), only the documentation could be changed to have 'utf-8', 'iso-8859-1', etc. as preferred name. [1]: http://docs.python.org/library/codecs.html#standard-encodings http://docs.python.org/3.0/library/codecs.html#standard-encodings ---------- assignee: georg.brandl components: Documentation, Library (Lib) messages: 86933 nosy: ezio.melotti, georg.brandl severity: normal status: open title: Stricter codec names type: behavior versions: Python 2.6, Python 2.7, Python 3.0, Python 3.1 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5902> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com