Re: Keeping track of the N largest values

Peter Otten Sun, 26 Dec 2010 02:43:15 -0800

Roy Smith wrote:

> In article <xns9e59a44d7cc49duncanbo...@127.0.0.1>,
>  Duncan Booth <duncan.bo...@invalid.invalid> wrote:
> 
>> Roy Smith <r...@panix.com> wrote:
>> 
>> > 
>> > I'm processing a stream of N numbers and want to keep track of the K
>> > largest.  There's too many numbers in the stream (i.e. N is too large)
>> > to keep in memory at once.  K is small (100 would be typical).
>> > ...
>> > Is there a better way to do this, either from a theoretical running
>> > time point of view, or just a nicer way to code this in Python?
>> 
>> How about:
>> 
>>   from heapq import nlargest
>>   top = nlargest(K, input())
>> 
>> It uses a heap so avoids completely resorting each time.
> 
> Hmmm, that looks like it would solve the problem as stated, but there's
> another twist.  In addition to finding the K largest values, I *also*
> need to do some other processing on all the values (which I didn't show
> in the original example, incorrectly thinking it not germane to my
> question).  The problem with nlargest() is that it doesn't give me a
> hook to do that.


If Paul's solution doesn't suffice -- the heapq module has the building 
blocks for a custom solution, e. g.:

import heapq
from functools import partial

class NLargest(object):
    def __init__(self, n):
        self.n = n
        self.heap = []
    def tally(self, item):
        heap = self.heap
        if len(heap) >= self.n:
            heapq.heappushpop(heap, item)
            self.tally = partial(heapq.heappushpop, heap)
        else:
            heapq.heappush(heap, item)
    def __str__(self):
        return str(sorted(self.heap, reverse=True))

if __name__ == "__main__":
    import random
    items = range(100)
    random.shuffle(items)
    accu = NLargest(10)
    for item in items:
        accu.tally(item)
        print item, accu

 
> PS: I'm assuming heapq.nlargest(n, iterable) operates with memory
> proportional to n, and not to the iterator length.  That's the only
> reasonable conclusion, but the docs don't actually come out and say it.

I would hope so.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Keeping track of the N largest values

Reply via email to