STINNER Victor added the comment:

str.translate() currently allocates a buffer of UCS4 characters.

translate_writer.patch:
- modify _PyUnicode_TranslateCharmap() to use the _PyUnicodeWriter API
- drop optimizations for error handlers different than "ignore" because there 
is no unit tests for them, and str.translate() uses "ignore". It's safer to 
drop untested optimization.
- cleanup also the code: charmaptranslate_output() is now responsible to handle 
charmaptranslate_lookup() result (to decrement the reference coutner)

str.translate() may be a little bit faster when translating ASCII to ASCII for 
large string, but not so much.

bytes.translate() is much faster because it builds a C array of 256 items to 
fast table lookup, whereas str.translate() requires a Python dict lookup for 
each character, which is much slower.

codecs.charmap_build() (PyUnicode_BuildEncodingMap()) creates a C array ("a 
three-level trie") for fast lookup. It is used with codecs.charmap_encode() for 
8-bit encodings. We may reuse it for simple cases, like translating ASCII to 
ASCII.

----------
keywords: +patch
Added file: http://bugs.python.org/file34691/translate_writer.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21118>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to