New submission from Serhiy Storchaka: In Python 2 PyUnicode_FromObject() was used for coercing 8-bit strings to unicode by decoding them with the default encoding. But in Python 3 there is no such coercing. The effect of PyUnicode_FromObject() in Python 3 is ensuring that the argument is a string and convert an instance of str subtype to exact str. The latter often is just a waste of memory and time, since resulted string is used only for retrieving UTF-8 representation or raw data.
Proposed patch makes following things: 1. Avoids unneeded copying of string's content. 2. Avoids raising some unneeded exceptions. 3. Gets rid of unneeded incref/decref. 4. Makes some error messages more correct or informative. 5. Converts runtime checks PyBytes_Check() for results of string encoding to asserts. Example of performance gain: Unpatched: $ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b" 1000000 loops, best of 3: 0.404 usec per loop $ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b" 1000000 loops, best of 3: 0.723 usec per loop Patched: $ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b" 1000000 loops, best of 3: 0.383 usec per loop $ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b" 1000000 loops, best of 3: 0.387 usec per loop ---------- components: Interpreter Core files: no_unicode_copy.patch keywords: patch messages: 257806 nosy: haypo, ncoghlan, serhiy.storchaka priority: normal severity: normal stage: patch review status: open title: Avoid nonneeded use of PyUnicode_FromObject() type: enhancement versions: Python 3.6 Added file: http://bugs.python.org/file41541/no_unicode_copy.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26057> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com