Keeping track of the N largest values

Roy Smith Sat, 25 Dec 2010 07:48:12 -0800

I'm processing a stream of N numbers and want to keep track of the K 
largest.  There's too many numbers in the stream (i.e. N is too large) 
to keep in memory at once.  K is small (100 would be typical).


>From a theoretical point of view, I should be able to do this in N log K 
time.  What I'm doing now is essentially:

top = [-1]    # Assume all x are >= 0                                           
                    
for x in input():
    if x <= top[0]:
        continue
    top.append(x)
    if len(top) > K:
        top.sort()
        top.pop(0)

I can see pathological cases (say, all input values the same) where 
running time would be N K log K, but on average (N >> K and random 
distribution of values), this should be pretty close to N.

Is there a better way to do this, either from a theoretical running time 
point of view, or just a nicer way to code this in Python?
-- 
http://mail.python.org/mailman/listinfo/python-list

Keeping track of the N largest values

Reply via email to