Bruno Cauet added the comment:

Here is an updated patch based on Dustin's work with Josh's comments. I also 
added a test which takes forever on an unpatched python interpreter.

Since it's a performance issue, I've benchmarked the results. They don't change 
for the most part (argument is a set or a dict) but they're way better for 
iterables.
For every type of argument I test 1 case where "set.issubset" returns True and 
1 case where it returns False.


(a) simple argument (results unchanged)

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1000))" 
"s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 63.7 usec per loop
Patched:   10000 loops, best of 3: 63.5 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1, 1000))" 
"s1.issubset(s2)"
Unpatched: 1000000 loops, best of 3: 0.248 usec per loop
Patched:   1000000 loops, best of 3: 0.25 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = 
dict(enumerate(range(1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 107 usec per loop
Patched: 10000 loops, best of 3: 108 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1, 
1000)))" "s1.issubset(s2)"
Unpatched: 10000 loops, best of 3: 43.5 usec per loop
Patched:   10000 loops, best of 3: 42.6 usec per loop


(b) iterable argument (speed improvement)

1) no improvements/slight degradation when everything must be consumed

$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 263 usec per loop
Patched:   1000 loops, best of 3: 263 usec per loop

$ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1, 1000))"
Unpatched: 10000 loops, best of 3: 201 usec per loop
Patched:   1000 loops, best of 3: 259 usec per loop

$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1, 1000))"
Unpatched: 1000 loops, best of 3: 198 usec per loop
Patched:   1000 loops, best of 3: 218 usec per loop

2) tremendous improvements when it can return early

$ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1000))"
Unpatched: 1000 loops, best of 3: 209 usec per loop
Patched:   100000 loops, best of 3: 12.1 usec per loop

$ ./python -m timeit -s "s1 = set('a'); s2 = ['a'] + ['b'] * 10000" 
"s1.issubset(s2)"
Unpatched: 1000 loops, best of 3: 368 usec per loop
Patched:   1000000 loops, best of 3: 0.934 usec per loop

$ ./python -m timeit -s "s1 = set('a'); from itertools import repeat" 
"s1.issubset(repeat('a'))"
Unpatched: NEVER FINISHES
Patched:   1000000 loops, best of 3: 1.33 usec per loop

----------
nosy: +bru, haypo
Added file: 
http://bugs.python.org/file37295/0001-Improve-set.issubset-performance-on-iterables.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18032>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to