New submission from newacct <spoon.reloa...@gmail.com>:

heapq.nlargest()/nsmallest() currently use a size-k heap to get the k 
largest/smallest items. This takes O(n log k): heapifying the first k elements 
(O(k)); iterating through (n-k) elements of the list, each insertion/deletion 
takes O(log k), for O((n-k)log k); and then sorting the elements at the end for 
O(k log k).

There are several more efficient ways to do this:

1) O(n + k log n): Put the entire list into the heap at once. This takes O(n). 
Then we can take out the k largest / smallest from the heap in sorted order, 
each taking O(log n), for a total of O(n + k log n), which is much better than 
O(n log k) if k is small and n is big. Unfortunately, this takes a lot (O(n)) 
of memory.

2) O(n + k log k): Use the linear time (O(n)) selection algorithm to find out 
what the k'th largest/smallest element is. 
(http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm)
 Then, partition (a la Quicksort) the elements into the ones bigger and smaller 
than this; this is also O(n). Maybe the partitioning can happen as part of the 
selection algorithm. At the end, sort the k elements, for O(k log k). If we 
remove the requirement that the result has to be sorted, then this algorithm is 
just O(n).

----------
messages: 128357
nosy: newacct
priority: normal
severity: normal
status: open
title: More efficient nlargest()/nsmallest()
type: performance

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11180>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to