[issue11322] encoding package's normalize_encoding() function is too slow

STINNER Victor Fri, 25 Feb 2011 15:03:13 -0800

STINNER Victor <victor.stin...@haypocalc.com> added the comment:

We should first implement the same algorithm of the 3 normalization functions 
and add tests for them (at least for the function in normalization):


 - normalize_encoding() in encodings: it doesn't convert to lowercase and keep 
non-ASCII letters
 - normalize_encoding() in unicodeobject.c
 - normalizestring() in codecs.c

normalize_encoding() in encodings is more laxist than the two other functions: 
it normalizes "  utf   8  " to 'utf_8'. But it doesn't convert to lowercase and 
keeps non-ASCII letters: "UTF-8é" is normalized "UTF_8é".

I don't know if the normalization functions have to be more or less strict, but 
I think that they should all give the same result.

----------
nosy: +haypo

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11322>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11322] encoding package's normalize_encoding() function is too slow

Reply via email to