Karthikeyan Singaravelan <tir.kar...@gmail.com> added the comment:

Thanks for the report. A couple of points as below : 

* This changes the interface of the function by removing a parameter. Thus it 
will break compatibility with Python 2 and also earlier versions of Python 3. 
Removing a parameter in the signature has to go through a deprecation cycle if 
this is going to be accepted.
* Please don't use time.time and mean for benchmarks that can be misleading. 
There are modules like timeit and perf (https://pypi.org/project/perf/) that 
are more reliable.

I looked for some more inefficiencies and I can see re.search for every run. 
Perhaps re.compile can be used to store the compiled regex at module level and 
then to match against the string. This makes the function 25% faster without 
changing the interface. In case casefold=False then an extra call to make the 
string upper case is avoided giving some more benefit.

With re.search inside the function

$ python3.7 -m perf timeit -s 'import base64; 
hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 
'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 3.08 us +- 0.22 us
$ python3.7 -m perf timeit -s 'import base64; 
hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca".upper()'
 'base64.b16decode(hex_data)'
.....................
Mean +- std dev: 2.93 us +- 0.20 us

With the regex compiled to a variable at the module level

$ python3.7 -m perf timeit -s 'import base64; 
hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 
'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 2.08 us +- 0.15 us
$ python3.7 -m perf timeit -s 'import base64; 
hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca".upper()'
 'base64.b16decode(hex_data)'
.....................
Mean +- std dev: 1.98 us +- 0.17 us


Since this is a comparison of fixed set of elements I tried using a set of 
elements and any to short-circuit but it seems to be slower

$ python3.7 -m perf timeit -s 'import base64; 
hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 
'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 8.21 us +- 0.66 us


I am opening a PR to use the compiled regex at the module level since I see it 
as a net win of 25-30% without any interface change or test case changes 
required.

----------
nosy: +xtreak

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35557>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to