[issue18962] Add special case for single iterator in heapq.merge function

Wouter Bolsterlee Sat, 07 Sep 2013 11:26:48 -0700

New submission from Wouter Bolsterlee:

The heapq.merge() function merges multiple sorted iterables into a single 
sorted output.  The function uses a heap queue that is repeatedly looped over 
until it has generated all output.


If only a single iterable is passed to heapq.merge(), the heap will have len(h) 
== 1, which means the looping and heap manipulation just degrades to a 
convoluted "yield from iterables[0]". This negatively impacts performance. The 
attached patch adds a short-circuit for this single input case.

The behaviour of the function remains unchanged:

>>> list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
>>> list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]

For multiple inputs, there is no measurable performance difference (measured
using IPython's %timeit). Without patch:

>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.4 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.5 µs per loop

With patch:

>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.8 µs per loop

The real performance gain is of course when only a single iterable is passed. 
Without patch:

>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.65 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.52 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.51 ms per loop

With patch:

>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 604 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 609 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 603 µs per loop

This is a ~4x speedup for an input consisting of 10000 items. Compared to plain 
iteration:

>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 263 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 268 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 273 µs per loop

Without the patch, heapq.merge() is ~10x as slow as plain iteration. With the 
patch, this is reduced to ~2.5x.

(Note: all testing done using Python 3.3.2 on a 64bit Linux machine.)

----------
components: Library (Lib)
files: heapq-merge-single-input-optimization.patch
keywords: patch
messages: 197174
nosy: wbolster
priority: normal
severity: normal
status: open
title: Add special case for single iterator in heapq.merge function
type: performance
versions: Python 3.3
Added file: 
http://bugs.python.org/file31649/heapq-merge-single-input-optimization.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18962>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18962] Add special case for single iterator in heapq.merge function

Reply via email to