Marc-Andre Lemburg added the comment: The codec code has a few (performance) issues:
* nonspacing_diacritical_marks should be a set for fast lookup * ord(c) in range(0x00, 0xA0) should be rewritten using < and >= * result += bytes([ord(c)]) has exponential timing (it copies the whole bytes string for every single operation); better use a bytearray and convert this to bytes in one final step * the error messages should include more useful information about the cause and location of the error, instead of just UnicodeError("Unacceptable unicode character") and raise KeyError Please also check whether it's not possible to reuse the charmap codec functions we have. Thanks. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24339> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com