[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > Ezio and I discussed on IRC the implementation of alias lookup and neither of > us was able to point out to the function that strips non-alphanumeric > characters from encoding

[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > >> Accepting all common forms for >> encoding names means that you can usually give Python an encoding name >> from, e.g. a HTML page, or any other file or system that specifies an

[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg
Changes by Marc-Andre Lemburg : -- status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:/

[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > What is the status of this. Status=open and Resolution=rejected contradict > each other. Sorry, forgot to close the ticket. -- ___

[issue5902] Stricter codec names

2011-02-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Ezio and I discussed on IRC the implementation of alias lookup and neither of us was able to point out to the function that strips non-alphanumeric characters from encoding names. It turns out that there are three "normalize" functions that are successi

[issue5902] Stricter codec names

2011-02-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: > Accepting all common forms for > encoding names means that you can usually give Python an encoding name > from, e.g. a HTML page, or any other file or system that specifies an > encoding. I don't buy this argument. Running attached script on http://ww

[issue5902] Stricter codec names

2011-02-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: What is the status of this. Status=open and Resolution=rejected contradict each other. This discussion is relevant for issue11303. Currently alias lookup incurs huge performance penalty in some cases. -- nosy: +belopolsky ___

[issue5902] Stricter codec names

2009-05-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-05-04 19:04, Georg Brandl wrote: > Georg Brandl added the comment: > > So, do you also think "utf" and "latin" should stay? For Python 3.x, I think those can be removed. For 2.x it's better to keep them. Note that UTF-8 was the first official Uni

[issue5902] Stricter codec names

2009-05-04 Thread Matthew Barnett
Matthew Barnett added the comment: Well, there are multiple UTF encodings, so no to "utf". Are there multiple Latin encodings? Not in Python 2.6.2 under those names. I'd probably insist on names that are strictish(?), ie correct, give or take a '-' or '_'. --

[issue5902] Stricter codec names

2009-05-04 Thread Georg Brandl
Georg Brandl added the comment: So, do you also think "utf" and "latin" should stay? -- ___ Python tracker ___ ___ Python-bugs-list ma

[issue5902] Stricter codec names

2009-05-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-05-02 11:20, Georg Brandl wrote: > Georg Brandl added the comment: > > I don't think this is a good idea. Accepting all common forms for > encoding names means that you can usually give Python an encoding name > from, e.g. a HTML page, or any othe

[issue5902] Stricter codec names

2009-05-03 Thread Ezio Melotti
Ezio Melotti added the comment: Actually I'd like to have some kind of convention mainly when the user writes the encoding as a string, e.g. s.encode('utf-8'). Indeed, if the encoding comes from a webpage or somewhere else it makes sense to have some flexibility. I think that 'utf-8' is the mos

[issue5902] Stricter codec names

2009-05-02 Thread Matthew Barnett
Matthew Barnett added the comment: How about a 'full' form and a 'key' form generated by the function: def codec_key(name): return name.lower().replace("-", "").replace("_", "") The key form would be the key to an available codec, and the key generated by a user-supplied codec name would h

[issue5902] Stricter codec names

2009-05-02 Thread Antoine Pitrou
Antoine Pitrou added the comment: Is there any reason for allowing "utf" as an alias to utf-8? It sounds much too ambiguous. The other silly variants (those with lots of spurious puncutuations characters) could be forbidden too. -- nosy: +pitrou status: pending -> open

[issue5902] Stricter codec names

2009-05-02 Thread Georg Brandl
Georg Brandl added the comment: I don't think this is a good idea. Accepting all common forms for encoding names means that you can usually give Python an encoding name from, e.g. a HTML page, or any other file or system that specifies an encoding. If we only supported, e.g., "UTF-8" and no ot

[issue5902] Stricter codec names

2009-05-02 Thread Ezio Melotti
New submission from Ezio Melotti : I noticed that codec names[1]: 1) can contain random/unnecessary spaces and punctuation; 2) have several aliases that could probably be removed; A few examples of valid codec names (done with Python 3): >>> s = 'xxx' >>> s.encode('utf') b'xxx' >>> s.encode('utf