John Machin" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On 14/06/2006 8:38 AM, Robert Kern wrote: > > Gary Herron wrote: > >> John Machin wrote: > >> > >>> On 13/06/2006 6:28 PM, Paul McGuire wrote: > >>> > >>>> (Oh, and I like groupby too! Combine it with sort to quickly create > >>>> histograms.) > >>>> > >>>> # tally a histogram of a list of values from 1-10 > >>>> dataValueRange = range(1,11) > >>>> data = [random.choice(dataValueRange) for i in xrange(10000)] > >>>> > >>>> hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ] > >>> That len(list(g)) looks like it uses O(N) memory just to find out what N > >>> is :-( > >> Not at all! A python list *knows* its length at all times. len() is a > >> constant time lookup of an internal attribute. > > > > The point is that you had to create the list in the first place. g is an iterator. > > > > I didn't have to create a list in the first place. Paul did. The point > of my post was to avoid the memory grab caused by list(g) by seeking a > way that just counted g's output. > > Sorry for the confusion my lack of clarity has evidently caused. I'll > rephrase: > > That whole construct > len(list(g)) > looks like it uses O(N) memory just to find out what N is. > Better?
Ok, ok!!! Here's a non-O(N) memory allocation, using a generator expression to count the number of items in the list. hist = [ (k,sum(1 for _g in g)) for k,g in itertools.groupby(sorted(data)) ] -- Paul -- http://mail.python.org/mailman/listinfo/python-list