In article <xns9e59a44d7cc49duncanbo...@127.0.0.1>, Duncan Booth <duncan.bo...@invalid.invalid> wrote:
> Roy Smith <r...@panix.com> wrote: > > > > > I'm processing a stream of N numbers and want to keep track of the K > > largest. There's too many numbers in the stream (i.e. N is too large) > > to keep in memory at once. K is small (100 would be typical). > > ... > > Is there a better way to do this, either from a theoretical running > > time point of view, or just a nicer way to code this in Python? > > How about: > > from heapq import nlargest > top = nlargest(K, input()) > > It uses a heap so avoids completely resorting each time. Hmmm, that looks like it would solve the problem as stated, but there's another twist. In addition to finding the K largest values, I *also* need to do some other processing on all the values (which I didn't show in the original example, incorrectly thinking it not germane to my question). The problem with nlargest() is that it doesn't give me a hook to do that. PS: I'm assuming heapq.nlargest(n, iterable) operates with memory proportional to n, and not to the iterator length. That's the only reasonable conclusion, but the docs don't actually come out and say it. -- http://mail.python.org/mailman/listinfo/python-list