[issue14419] Faster ascii decoding

2012-05-11 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- dependencies: -Amazingly faster UTF-8 decoding superseder: -> Amazingly faster UTF-8 decoding ___ Python tracker ___

[issue14419] Faster ascii decoding

2012-05-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Okay, thank you! -- dependencies: +Amazingly faster UTF-8 decoding resolution: -> duplicate stage: -> committed/rejected status: open -> closed ___ Python tracker __

[issue14419] Faster ascii decoding

2012-05-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Since the patch commited as part of new UTF-8 decoder, this issue can be closed (issue14738). -- ___ Python tracker ___

[issue14419] Faster ascii decoding

2012-04-01 Thread Jesús Cea Avión
Changes by Jesús Cea Avión : -- nosy: +jcea ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue14419] Faster ascii decoding

2012-03-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: As you can see, the unpatched code does not depend on the alignment. With patches aligned data (which constitute the vast majority, if not all) decoded much faster and non-aligned data decoded sometimes slightly slower. Time of decoding 2-10-bytes practical

[issue14419] Faster ascii decoding

2012-03-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Python script too rough tool to measure decoding performance on short strings. To do this I used C. The benchmark scheme is as follows. Taken a big enough chunk of memory to reduce effect of processor cache. This area is splitted into many pieces with the

[issue14419] Faster ascii decoding

2012-03-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: > This may also depend on the processor and compiler. I have AMD Athlon > 64 X2 4600+ (2-core, 2.4GHz, 512 KB cache) and use gcc 4.4.3 on 32-bit > Linux. Then by choosing a string length that exceeds the L2 cache size, you may have found an ideal case for you

[issue14419] Faster ascii decoding

2012-03-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > New tests. I'm not conviced by the patch: it slows down the decoder for > "short" strings. I don't understand which kind of ASCII encoded strings > (specific length or content?) are optimized by the patch. May be you forgot the -r? Add -r 100 or -r 1000 and

[issue14419] Faster ascii decoding

2012-03-27 Thread STINNER Victor
STINNER Victor added the comment: New tests. I'm not conviced by the patch: it slows down the decoder for "short" strings. I don't understand which kind of ASCII encoded strings (specific length or content?) are optimized by the patch. Unpatched: $ ./python -m timeit -n 5 -r 100 -s 'data

[issue14419] Faster ascii decoding

2012-03-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > q is not the address of the Unicode string, but the address of the > data following the Unicode structure in memory. Strings created by > PyUnicode_New() are composed on one unique memory block: {structure, > data}. I know all that. #define _PyUnicode_COMP

[issue14419] Faster ascii decoding

2012-03-27 Thread STINNER Victor
STINNER Victor added the comment: >> +#if SIZEOF_LONG <= SIZEOF_VOID_P >> +    if (!((size_t) p & LONG_PTR_MASK)) { >> >> I wrote "q", not "p". You have to check p and q alignement to be able >> to dereference p and q pointers. > > Initial q (destination) is always pointer-aligned, because PyASC

[issue14419] Faster ascii decoding

2012-03-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > +#if SIZEOF_LONG <= SIZEOF_VOID_P > +if (!((size_t) p & LONG_PTR_MASK)) { > > I wrote "q", not "p". You have to check p and q alignement to be able > to dereference p and q pointers. Initial q (destination) is always pointer-aligned, because PyASCIIObj

[issue14419] Faster ascii decoding

2012-03-26 Thread STINNER Victor
STINNER Victor added the comment: +#if SIZEOF_LONG <= SIZEOF_VOID_P +if (!((size_t) p & LONG_PTR_MASK)) { I wrote "q", not "p". You have to check p and q alignement to be able to dereference p and q pointers. sizeof(long) <= sizeof(void*) is always true. -- __

[issue14419] Faster ascii decoding

2012-03-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Added file: http://bugs.python.org/file25034/decode_ascii_2.patch ___ Python tracker ___ ___ Python-bugs-list ma

[issue14419] Faster ascii decoding

2012-03-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > +if (((size_t) p & LONG_PTR_MASK) == ((size_t) q & LONG_PTR_MASK)) { > This test looks. I think that it should be replaced by: > if (!((size_t) q & LONG_PTR_MASK)) { "if (!((size_t) p & LONG_PTR_MASK)) {" if sizeof(long) <= sizeof(void *). And rewr

[issue14419] Faster ascii decoding

2012-03-26 Thread STINNER Victor
STINNER Victor added the comment: +if (((size_t) p & LONG_PTR_MASK) == ((size_t) q & LONG_PTR_MASK)) { This test looks. I think that it should be replaced by: if (!((size_t) q & LONG_PTR_MASK)) { -- ___ Python tracker

[issue14419] Faster ascii decoding

2012-03-26 Thread STINNER Victor
STINNER Victor added the comment: Results on a 64-bit Linux box, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz with 12 GB of RAM: Unpatched: 1 loops, best of 3: 150 usec per loop Patched: 1 loops, best of 3: 80.2 usec per loop -- nosy: +haypo _

[issue14419] Faster ascii decoding

2012-03-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I would like to have someone tested it on a 64-bit platform. May be worth a copy smaller blocks, not sizeof(long) but sizeof(void *)? This guaranteed the alignment of the destination (and very likely the source). --

[issue14419] Faster ascii decoding

2012-03-26 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: +pitrou ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

[issue14419] Faster ascii decoding

2012-03-26 Thread Serhiy Storchaka
New submission from Serhiy Storchaka : The proposed patch accelerates ascii decoding in a particular case, when the alignment of the input data coincides with the alignment of data in PyASCIIObject. This is a common case on 32-bit platforms. I did not check whether the patch have any effect on