On Fri, Jan 16, 2009 at 8:39 AM, Per Freem <perfr...@yahoo.com> wrote:
> hello > > i have an optimization questions about python. i am iterating through > a file and counting the number of repeated elements. the file has on > the order > of tens of millions elements... > > > for line in file: > try: > elt = MyClass(line)# extract elt from line... > my_dict[elt] += 1 > except KeyError: > my_dict[elt] = 1 > > > class MyClass > > def __str__(self): > return "%s-%s-%s" %(self.field1, self.field2, self.field3) > > def __repr__(self): > return str(self) > > def __hash__(self): > return hash(str(self)) > > > is there anything that can be done to speed up this simply code? right > now it is taking well over 15 minutes to process, on a 3 Ghz machine > with lots of RAM (though this is all taking CPU power, not RAM at this > point.) > > any general advice on how to optimize large dicts would be great too > > thanks for your help. > -- > http://mail.python.org/mailman/listinfo/python-list > Hello, You can get a large speedup by removing the need to instantiate a new MyClass instance on each iteration of your loop. Instead define one MyClass with an 'interpret' method that would be called instead of MyClass() interpret would return the string '%s-%s-%s' % (self.field1 etc..) i.e myclass = MyClass() interpret = myclass.interpret for line in file: elt = interpet(line)# extract elt from line... try: my_dict[elt] += 1 except KeyError: my_dict[elt] = 1 The speed up is on the order of 10 on my machine. Cheers,
-- http://mail.python.org/mailman/listinfo/python-list