New submission from Serhiy Storchaka:

Currently _sre.getlower() takes two arguments. Depending on the bits set in the 
second argument it uses one of three algorithms for determining the lower case 
of the character -- Unicode, ASCII-only, and locale-depended. After resolving 
issue30215 _sre.getlower() no longer used for locale-depended case. Proposed 
patch replaces _sre.getlower() with two one-argument functions: 
_sre.ascii_tolower() and _sre.unicode_tolower(). This slightly speeds up 
compiling cases-insensitive regular expressions, especially containing ranges.

$ ./python -m timeit -s 'import sre_compile'  
'sre_compile.compile("(?i)ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0)'
Unpatched:  2000 loops, best of 5: 180 usec per loop
Patched:    2000 loops, best of 5: 173 usec per loop

$ ./python -m timeit -s 'import sre_compile'  
'sre_compile.compile("(?ia)ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0)'
Unpatched:  2000 loops, best of 5: 175 usec per loop
Patched:    2000 loops, best of 5: 168 usec per loop

$ ./python -m timeit -s 'import sre_compile'  'sre_compile.compile("(?i)[A-Z]", 
0)'
Unpatched:  500 loops, best of 5: 788 usec per loop
Patched:    500 loops, best of 5: 766 usec per loop

$ ./python -m timeit -s 'import sre_compile'  
'sre_compile.compile("(?ia)[A-Z]", 0)'
Unpatched:  5000 loops, best of 5: 92 usec per loop
Patched:    5000 loops, best of 5: 83.2 usec per loop

$ ./python -m timeit -s 'import sre_compile'  
'sre_compile.compile("(?i)[\u0410-\u042f]", 0)'
Unpatched:  2000 loops, best of 5: 141 usec per loop
Patched:    2000 loops, best of 5: 122 usec per loop

$ ./python -m timeit -s 'import sre_compile'  
'sre_compile.compile("(?i)[\u0000-\uffff]", 0)'
Unpatched:  5 loops, best of 5: 59 msec per loop
Patched:    10 loops, best of 5: 28.9 msec per loop

----------
assignee: serhiy.storchaka
components: Library (Lib), Regular Expressions
messages: 293049
nosy: ezio.melotti, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Speeds up compiling cases-insensitive regular expressions
type: performance
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30277>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to