[issue13624] UTF-8 encoder performance regression in python3.3

STINNER Victor Sat, 17 Dec 2011 10:49:22 -0800

New submission from STINNER Victor <victor.stin...@haypocalc.com>:

iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 
than Python 3.2. The performance depends on the characters of the input string:


 * 8x faster (!) for a string of 50.000 ASCII characters
 * 1.5x slower for a string of 50.000 UCS-1 characters
 * 2.5x slower for a string of 50.000 UCS-2 characters

The bottleneck looks to be the the PyUnicode_READ() macro.

 * Python 3.2: s[i++]
 * Python 3.3: PyUnicode_READ(kind, data, i++)

Because encoding string to UTF-8 is a very common operation, performances do 
matter. Antoine suggests to have different versions of the function for each 
Unicode kind (1, 2, 4).

----------
components: Unicode
messages: 149695
nosy: ezio.melotti, haypo, pitrou
priority: normal
severity: normal
status: open
title: UTF-8 encoder performance regression in python3.3
type: performance
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13624>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13624] UTF-8 encoder performance regression in python3.3

Reply via email to