[issue13155] Optimize finding the max character width

2011-10-12 Thread Antoine Pitrou
Antoine Pitrou added the comment: > find_max_char5.patch: > - don't use adjusted ~mask+1: "precompute" the right max_char > - rename findwidth.h to find_max_char.h > - add some #undef Thank you, I've committed this version. -- resolution: -> fixed stage: -> committed/rejected status:

[issue13155] Optimize finding the max character width

2011-10-12 Thread Roundup Robot
Roundup Robot added the comment: New changeset 9c7d3207fc15 by Antoine Pitrou in branch 'default': Issue #13155: Optimize finding the optimal character width of an unicode string http://hg.python.org/cpython/rev/9c7d3207fc15 -- nosy: +python-dev ___

[issue13155] Optimize finding the max character width

2011-10-12 Thread STINNER Victor
STINNER Victor added the comment: find_max_char5.patch: - don't use adjusted ~mask+1: "precompute" the right max_char - rename findwidth.h to find_max_char.h - add some #undef -- Added file: http://bugs.python.org/file23392/find_max_char5.patch __

[issue13155] Optimize finding the max character width

2011-10-12 Thread STINNER Victor
STINNER Victor added the comment: With find_max_char4.patch: python3.3 -m timeit 'x="é"+"x"*1' 'x[1:]' 10 loops, best of 3: 1.96 usec per loop -- ___ Python tracker ___

[issue13155] Optimize finding the max character width

2011-10-12 Thread STINNER Victor
STINNER Victor added the comment: Without the patch: python3.2 -m timeit 'x="é"+"x"*1' 'x[1:]' 10 loops, best of 3: 2.18 usec per loop python3.3 -m timeit 'x="é"+"x"*1' 'x[1:]' 10 loops, best

[issue13155] Optimize finding the max character width

2011-10-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > > Ok, updated patch. > > "ret = ~mask + 1;" looks wrong: (~0xFF80+1) gives 128, not 127. That's on purpose, since the mask has just matched. If 0xFF80 matches, then the max char can't be 127, it has to be at least 128. > I don't see why you need: >

[issue13155] Optimize finding the max character width

2011-10-11 Thread STINNER Victor
STINNER Victor added the comment: > Ok, updated patch. "ret = ~mask + 1;" looks wrong: (~0xFF80+1) gives 128, not 127. I don't see why you need: +if (ret < 128) +return 127; +if (ret < 256) +return 255; #undef ASCII_CHAR_MASK should be #undef UCS1_ASCII_CHAR_MASK

[issue13155] Optimize finding the max character width

2011-10-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Ok, updated patch. -- Added file: http://bugs.python.org/file23382/find_max_char4.patch ___ Python tracker ___

[issue13155] Optimize finding the max character width

2011-10-11 Thread STINNER Victor
STINNER Victor added the comment: find_max_char() returns 0x1 instead of 0x10, which may be wrong (or at least, surprising). You may add a max_char variable using other macros like MAX_CHAR_ASCII, MAX_CHAR_UCS1, ..., which will be set at the same time than mask. Or restore your if (re

[issue13155] Optimize finding the max character width

2011-10-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Slightly cleaned up patch after Victor's comments in private. -- Added file: http://bugs.python.org/file23381/find_max_char3.patch ___ Python tracker _

[issue13155] Optimize finding the max character width

2011-10-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: I hadn't noticed the STRINGLIB_SIZEOF_CHAR constant. Reuse it instead of adding STRINGLIB_CHAR_SIZE. -- Added file: http://bugs.python.org/file23380/find_max_char2.patch ___ Python tracker

[issue13155] Optimize finding the max character width

2011-10-11 Thread Antoine Pitrou
New submission from Antoine Pitrou : This patch optimizes scanning for the max character width in an unicode buffer. Micro-benchmarking some worst case situations: $ ./python -m timeit -s "x='é'+'x'*10" "x[1:]" -> before: 1 loops, best of 3: 74.9 usec per loop -> after: 10 loops, be