Matthew Barnett added the comment: It appears that in your tests Python 3.2 is faster with Unicode than bytestrings and that unpatched Python 3.4 is a lot slower.
I get somewhat different results (Windows XP Pro, 32-bit): C:\Python32\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)" 1000 loops, best of 3: 449 usec per loop C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)" 1000 loops, best of 3: 506 usec per loop C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)" 1000 loops, best of 3: 506 usec per loop C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)" 1000 loops, best of 3: 227 usec per loop C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)" 1000 loops, best of 3: 339 usec per loop C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)" 1000 loops, best of 3: 504 usec per loop For comparison, in the regex module I don't duplicate whole sections of code, but instead have a pointer to one of 3 functions (for UCS1, UCS2 and UCS4) that gets the codepoint, except for some tight loops. Doing that might be too much of a change for re. However, the speed appears to be a lot more consistent: C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)" 10000 loops, best of 3: 113 usec per loop C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)" 10000 loops, best of 3: 113 usec per loop C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)" 10000 loops, best of 3: 113 usec per loop C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)" 10000 loops, best of 3: 113 usec per loop C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)" 10000 loops, best of 3: 113 usec per loop C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)" 10000 loops, best of 3: 113 usec per loop ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18685> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com