Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:

It seems to me that regular expressions used in the lib2to3 version are more 
efficient but more complex.

$ ./python -m timeit -s 'import re; p = re.compile(r"0[bB](?:_?[01])+"); s = 
"0b"+"_0101"*16' 'p.match(s)'
100000 loops, best of 5: 2.45 usec per loop

$ ./python -m timeit -s 'import re; p = re.compile(r"0[bB]_?[01]+(?:_[01]+)*"); 
s = "0b"+"_0101"*16' 'p.match(s)'
200000 loops, best of 5: 1.08 usec per loop

$ ./python -m timeit -s 'import re; p = 
re.compile(r"0[xX](?:_?[0-9a-fA-F])+[lL]?"); s = "0x_0123_4567_89ab_cdef"' 
'p.match(s)'
500000 loops, best of 5: 815 nsec per loop

$ ./python -m timeit -s 'import re; p = 
re.compile(r"0[xX]_?[\da-fA-F]+(?:_[\da-fA-F]+)*[lL]?"); s = 
"0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 542 nsec per loop

Since the performance of lib2to3 is important, it is better to keep the current 
regexpes.

But using \d in Python 3 is a bug, it should be replaced with [0-9]. This also 
speeds up the regex:

$ ./python -m timeit -s 'import re; p = 
re.compile(r"0[xX]_?[0-9a-fA-F]+(?:_[0-9a-fA-F]+)*[lL]?"); s = 
"0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 471 nsec per loop

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33338>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to