New submission from Wouter Bolsterlee: The heapq.merge() function merges multiple sorted iterables into a single sorted output. The function uses a heap queue that is repeatedly looped over until it has generated all output.
If only a single iterable is passed to heapq.merge(), the heap will have len(h) == 1, which means the looping and heap manipulation just degrades to a convoluted "yield from iterables[0]". This negatively impacts performance. The attached patch adds a short-circuit for this single input case. The behaviour of the function remains unchanged: >>> list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25] >>> list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25] For multiple inputs, there is no measurable performance difference (measured using IPython's %timeit). Without patch: >>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 100000 loops, best of 3: 13.7 µs per loop >>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 100000 loops, best of 3: 13.4 µs per loop >>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 100000 loops, best of 3: 13.5 µs per loop With patch: >>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 100000 loops, best of 3: 13.7 µs per loop >>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 100000 loops, best of 3: 13.6 µs per loop >>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 100000 loops, best of 3: 13.6 µs per loop >>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 100000 loops, best of 3: 13.8 µs per loop The real performance gain is of course when only a single iterable is passed. Without patch: >>> %timeit for x in merge_before(range(10000)): pass 100 loops, best of 3: 2.65 ms per loop >>> %timeit for x in merge_before(range(10000)): pass 100 loops, best of 3: 2.52 ms per loop >>> %timeit for x in merge_before(range(10000)): pass 100 loops, best of 3: 2.51 ms per loop With patch: >>> %timeit for x in merge_after(range(10000)): pass 1000 loops, best of 3: 604 µs per loop >>> %timeit for x in merge_after(range(10000)): pass 1000 loops, best of 3: 609 µs per loop >>> %timeit for x in merge_after(range(10000)): pass 1000 loops, best of 3: 603 µs per loop This is a ~4x speedup for an input consisting of 10000 items. Compared to plain iteration: >>> %timeit for x in range(10000): pass 1000 loops, best of 3: 263 µs per loop >>> %timeit for x in range(10000): pass 1000 loops, best of 3: 268 µs per loop >>> %timeit for x in range(10000): pass 1000 loops, best of 3: 273 µs per loop Without the patch, heapq.merge() is ~10x as slow as plain iteration. With the patch, this is reduced to ~2.5x. (Note: all testing done using Python 3.3.2 on a 64bit Linux machine.) ---------- components: Library (Lib) files: heapq-merge-single-input-optimization.patch keywords: patch messages: 197174 nosy: wbolster priority: normal severity: normal status: open title: Add special case for single iterator in heapq.merge function type: performance versions: Python 3.3 Added file: http://bugs.python.org/file31649/heapq-merge-single-input-optimization.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18962> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com