Antoine Pitrou added the comment:
Patch committed. It is of course still not as fast as memcpy, but it's a small
step towards improving performance.
--
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
___
Python
Roundup Robot added the comment:
New changeset 5b077c962a16 by Antoine Pitrou in branch 'default':
Issue #13136: speed up conversion between different character widths.
http://hg.python.org/cpython/rev/5b077c962a16
--
nosy: +python-dev
___
Python tra
Marc-Andre Lemburg added the comment:
Antoine Pitrou wrote:
>
>> I tested using memchr() when writing those "naive" loops.
>
> memchr() is mentioned in another issue, #13134.
Looks like I posted the comment to the wrong ticket.
--
___
Python track
Martin v. Löwis added the comment:
Marc-Andre: gcc will normally not unroll loops, unless -funroll-loops is given
on the command line. Then, it will unroll many loops, and do so with 8
iterations per outer loop. This typically causes significant code bloat, which
is why unrolling is normally
Meador Inge added the comment:
On Sat, Oct 8, 2011 at 5:34 PM, Antoine Pitrou wrote:
> Antoine Pitrou added the comment:
>
>> Before going further with this, I'd suggest you have a look at your
>> compiler settings.
>
> They are set by the configure script:
>
> gcc -pthread -c -Wno-unused-res
Antoine Pitrou added the comment:
> Before going further with this, I'd suggest you have a look at your
> compiler settings.
They are set by the configure script:
gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes-I. -I./Include-DPy_BUILD_CORE -o
Objec
Marc-Andre Lemburg added the comment:
Antoine Pitrou wrote:
>
> New submission from Antoine Pitrou :
>
> This patch speeds up _PyUnicode_CONVERT_BYTES by unrolling its loop.
>
> Example micro-benchmark:
>
> ./python -m timeit -s "a='x'*1;b='\u0102'*1000;c='\U0010'" "a+b+c"
>
> -> be
New submission from Antoine Pitrou :
This patch speeds up _PyUnicode_CONVERT_BYTES by unrolling its loop.
Example micro-benchmark:
./python -m timeit -s "a='x'*1;b='\u0102'*1000;c='\U0010'" "a+b+c"
-> before:
10 loops, best of 3: 14.9 usec per loop
-> after:
10 loops, best of 3