Re: Keeping track of the N largest values

Stefan Sonnenberg-Carstens Sun, 26 Dec 2010 11:09:30 -0800

Am 25.12.2010 16:42, schrieb Roy Smith:

I'm processing a stream of N numbers and want to keep track of the K
largest.  There's too many numbers in the stream (i.e. N is too large)
to keep in memory at once.  K is small (100 would be typical).


> From a theoretical point of view, I should be able to do this in N log K
time.  What I'm doing now is essentially:

top = [-1]    # Assume all x are>= 0
for x in input():
     if x<= top[0]:
         continue
     top.append(x)
     if len(top)>  K:
         top.sort()
         top.pop(0)

I can see pathological cases (say, all input values the same) where
running time would be N K log K, but on average (N>>  K and random
distribution of values), this should be pretty close to N.

Is there a better way to do this, either from a theoretical running time
point of view, or just a nicer way to code this in Python?

Here is my version:

l = []
K = 10

while 1:
    a = input()
    if len(l) == K:
        l.remove(min(l))
    l=[x for x in l if x < a] + [a] + [x for x in l if x > a]
    print l
--
http://mail.python.org/mailman/listinfo/python-list

Re: Keeping track of the N largest values

Reply via email to