[issue25823] Speed-up oparg decoding on little-endian machines

Armin Rigo Sat, 12 Dec 2015 01:54:08 -0800

Armin Rigo added the comment:

Fwiw, I made a trivial benchmark in C that loads aligned and misaligned shorts 
( http://paste.pound-python.org/show/HwnbCI3Pqsj8bx25Yfwp/ ).  It shows that 
the memcpy() version takes only 65% of the time taken by the two-bytes-loaded 
version on a 2010 laptop.  It takes 75% of the time on a modern server.  On a 
recent little-endian PowerPC machine, 96%.  On aarch64, only 45% faster (i.e. 
more than twice faster).  This is all with gcc.  It seems that using memcpy() 
is definitely a win nowadays.


----------
nosy: +arigo

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25823>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25823] Speed-up oparg decoding on little-endian machines

Reply via email to