[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread STINNER Victor
STINNER Victor added the comment: > It's actually still O(n): the UTF-8 data still need to be copied > into a bytes object. Hum, correct, but a memory copy is much faster than having to decode UTF-8. -- ___ Python tracker

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Oooh, it's just faster because encoding ASCII to UTF-8 is now O(1) It's actually still O(n): the UTF-8 data still need to be copied into a bytes object. -- ___ Python tracker

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread STINNER Victor
Changes by STINNER Victor : -- resolution: -> fixed status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mailing list U

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset fbd797fc3809 by Victor Stinner in branch 'default': Issue #13624: Write a specialized UTF-8 encoder to allow more optimization http://hg.python.org/cpython/rev/fbd797fc3809 -- nosy: +python-dev ___ Pytho

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread STINNER Victor
STINNER Victor added the comment: Patch version 3 to fix compiler warnings (avoid variables used for the error handler, unneeded for UCS-1). -- Added file: http://bugs.python.org/file24023/utf8_encoder-3.patch ___ Python tracker

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file24005/utf8_encoder.patch ___ Python tracker ___ ___ Python-bugs-list mail

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread STINNER Victor
STINNER Victor added the comment: utf8_encoder_prescan.patch: precompute the size of the output to avoid a PyBytes_Resize() at exit. It is much slower: ASCII: 10 loops, best of 3: 2.06 usec per loop UCS-1: 1 loops, best of 3: 123 usec per loop UCS-2: 1 loops, best of 3: 171 usec pe

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-18 Thread STINNER Victor
STINNER Victor added the comment: Updated patch to fix also the size of the small buffer on the stack, as suggested by Antoine. -- Added file: http://bugs.python.org/file24021/utf8_encoder-2.patch ___ Python tracker

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread STINNER Victor
STINNER Victor added the comment: > 8x faster (!) for a string of 50.000 ASCII characters Oooh, it's just faster because encoding ASCII to UTF-8 is now O(1). The ASCII data is shared with the UTF-8 data thanks to the PEP 393! -- ___ Python tracker

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread STINNER Victor
STINNER Victor added the comment: Python 3.2 (narrow): ASCII: 1 loops, best of 3: 28.2 usec per loop UCS-1: 1 loops, best of 3: 59.1 usec per loop UCS-2: 1 loops, best of 3: 88.8 usec per loop UCS-4: 1000 loops, best of 3: 254 usec per loop Python 3.2 (wide): ASCII: 1 loops, b

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread Florent Xicluna
Changes by Florent Xicluna : -- nosy: +flox ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread STINNER Victor
STINNER Victor added the comment: Oh, Antoine told me that I missed the -s command line argument to timeit: $ cat bench.sh echo -n "ASCII: " ./python -m timeit -s 'x="A"*5' 'x.encode("utf-8")' echo -n "UCS-1: " ./python -m timeit -s 'x="\xe9"*5' 'x.encode("utf-8")' echo -n "UCS-2: " ./

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread STINNER Victor
STINNER Victor added the comment: > Can you please provide your exact testing procedure? Here you have. $ cat bench.sh echo -n "ASCII: " ./python -m timeit 'x="A"*5' 'x.encode("utf-8")' echo -n "UCS-1: " ./python -m timeit 'x="\xe9"*5' 'x.encode("utf-8")' echo -n "UCS-2: " ./python -m

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread Martin v . Löwis
Martin v. Löwis added the comment: Can you please provide your exact testing procedure? Standard iobench.py doesn't support testing for separate ASCII, UCS-1 and UCS-2 data, so you must have used some other tool. Exact code, command line parameters, hardware description and timing results wou

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread Jesús Cea Avión
Changes by Jesús Cea Avión : -- nosy: +jcea ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue13624] UTF-8 encoder performance regression in python3.3

2011-12-17 Thread STINNER Victor
New submission from STINNER Victor : iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 than Python 3.2. The performance depends on the characters of the input string: * 8x faster (!) for a string of 50.000 ASCII characters * 1.5x slower for a string of 50.000 UCS