[issue13624] UTF-8 encoder performance regression in python3.3

Martin v . Löwis Sat, 17 Dec 2011 11:50:19 -0800

Martin v. Löwis <mar...@v.loewis.de> added the comment:

Can you please provide your exact testing procedure? Standard iobench.py 
doesn't support testing for separate ASCII, UCS-1 and UCS-2 data, so you must 
have used some other tool. Exact code, command line parameters, hardware 
description and timing results would be appreciated.


Looking at the encoder, I think the first thing to change is to reduce the 
over-allocation for UCS-1 and UCS-2 strings. This may or may not help the 
run-time, but should reduce memory consumption.

I wonder whether making two passes over the string (one to compute the size, 
and the other one with an allocated result buffer) could improve the 
performance.

If there is further special-casing, I'd only special-case UCS-1. I doubt that 
the _READ() macro really is the bottleneck, and would rather expect that loop 
unrolling can help. Because of unallowed surrogates, unrolling is not practical 
for UCS-2.

----------
nosy: +loewis

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13624>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13624] UTF-8 encoder performance regression in python3.3

Reply via email to