New submission from STINNER Victor <victor.stin...@haypocalc.com>: iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 than Python 3.2. The performance depends on the characters of the input string:
* 8x faster (!) for a string of 50.000 ASCII characters * 1.5x slower for a string of 50.000 UCS-1 characters * 2.5x slower for a string of 50.000 UCS-2 characters The bottleneck looks to be the the PyUnicode_READ() macro. * Python 3.2: s[i++] * Python 3.3: PyUnicode_READ(kind, data, i++) Because encoding string to UTF-8 is a very common operation, performances do matter. Antoine suggests to have different versions of the function for each Unicode kind (1, 2, 4). ---------- components: Unicode messages: 149695 nosy: ezio.melotti, haypo, pitrou priority: normal severity: normal status: open title: UTF-8 encoder performance regression in python3.3 type: performance versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13624> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com