Bruno Cauet added the comment: Here is an updated patch based on Dustin's work with Josh's comments. I also added a test which takes forever on an unpatched python interpreter.
Since it's a performance issue, I've benchmarked the results. They don't change for the most part (argument is a set or a dict) but they're way better for iterables. For every type of argument I test 1 case where "set.issubset" returns True and 1 case where it returns False. (a) simple argument (results unchanged) $ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1000))" "s1.issubset(s2)" Unpatched: 10000 loops, best of 3: 63.7 usec per loop Patched: 10000 loops, best of 3: 63.5 usec per loop $ ./python -m timeit -s "s1 = set(range(1000)); s2 = set(range(1, 1000))" "s1.issubset(s2)" Unpatched: 1000000 loops, best of 3: 0.248 usec per loop Patched: 1000000 loops, best of 3: 0.25 usec per loop $ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1000)))" "s1.issubset(s2)" Unpatched: 10000 loops, best of 3: 107 usec per loop Patched: 10000 loops, best of 3: 108 usec per loop $ ./python -m timeit -s "s1 = set(range(1000)); s2 = dict(enumerate(range(1, 1000)))" "s1.issubset(s2)" Unpatched: 10000 loops, best of 3: 43.5 usec per loop Patched: 10000 loops, best of 3: 42.6 usec per loop (b) iterable argument (speed improvement) 1) no improvements/slight degradation when everything must be consumed $ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1000))" Unpatched: 1000 loops, best of 3: 263 usec per loop Patched: 1000 loops, best of 3: 263 usec per loop $ ./python -m timeit -s "s1 = set(range(1000))" "s1.issubset(range(1, 1000))" Unpatched: 10000 loops, best of 3: 201 usec per loop Patched: 1000 loops, best of 3: 259 usec per loop $ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1, 1000))" Unpatched: 1000 loops, best of 3: 198 usec per loop Patched: 1000 loops, best of 3: 218 usec per loop 2) tremendous improvements when it can return early $ ./python -m timeit -s "s1 = set(range(100))" "s1.issubset(range(1000))" Unpatched: 1000 loops, best of 3: 209 usec per loop Patched: 100000 loops, best of 3: 12.1 usec per loop $ ./python -m timeit -s "s1 = set('a'); s2 = ['a'] + ['b'] * 10000" "s1.issubset(s2)" Unpatched: 1000 loops, best of 3: 368 usec per loop Patched: 1000000 loops, best of 3: 0.934 usec per loop $ ./python -m timeit -s "s1 = set('a'); from itertools import repeat" "s1.issubset(repeat('a'))" Unpatched: NEVER FINISHES Patched: 1000000 loops, best of 3: 1.33 usec per loop ---------- nosy: +bru, haypo Added file: http://bugs.python.org/file37295/0001-Improve-set.issubset-performance-on-iterables.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18032> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com