[issue4868] Faster utf-8 decoding

Marc-Andre Lemburg Wed, 07 Jan 2009 10:36:01 -0800

Marc-Andre Lemburg <m...@egenix.com> added the comment:

On 2009-01-07 16:25, Antoine Pitrou wrote:
> New submission from Antoine Pitrou <pit...@free.fr>:
> 
> Here is a patch to speedup utf8 decoding. On a 64-bit build, the maximum
> speedup is around 30%, and on a 32-bit build around 15%. (*)
> 
> The patch may look disturbingly trivial, and I haven't studied the
> assembler output, but I think it is explained by the fact that having a
> separate loop counter breaks the register dependencies (when the 's'
> pointer was incremented, other operations had to wait for the
> incrementation to be committed).
> 
> [side note: utf8 encoding is still much faster than decoding, but it may
> be because it allocates a smaller object, regardless of the iteration count]
> 
> The same principle can probably be applied to the other decoding
> functions in unicodeobject.c, but first I wanted to know whether the
> principle is ok to apply. Marc-André, what is your take?


I'm +1 on anything that makes codecs faster :-)

However, the patch should be checked with some other compilers
as well, e.g. using MS VC++.

> (*) the benchmark I used is:
> 
> ./python -m timeit -s "import
> codecs;c=codecs.utf_8_decode;s=b'abcde'*1000" "c(s)"
> 
> More complex input also gets a speedup, albeit a smaller one (~10%):
> 
> ./python -m timeit -s "import
> codecs;c=codecs.utf_8_decode;s=b'\xc3\xa9\xe7\xb4\xa2'*1000" "c(s)"

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4868>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4868] Faster utf-8 decoding

Reply via email to