hello i have an optimization questions about python. i am iterating through a file and counting the number of repeated elements. the file has on the order of tens of millions elements...
i create a dictionary that maps elements of the file that i want to count to their number of occurs. so i iterate through the file and for each line extract the elements (simple text operation) and see if it has an entry in the dict: for line in file: try: elt = MyClass(line)# extract elt from line... my_dict[elt] += 1 except KeyError: my_dict[elt] = 1 i am using try/except since it is supposedly faster (though i am not sure about this? is this really true in Python 2.5?). the only 'twist' is that my elt is an instance of a class (MyClass) with 3 fields, all numeric. the class is hashable, and so my_dict[elt] works well. the __repr__ and __hash__ methods of my class simply return str() representation of self, while __str__ just makes everything numeric field into a concatenated string: class MyClass def __str__(self): return "%s-%s-%s" %(self.field1, self.field2, self.field3) def __repr__(self): return str(self) def __hash__(self): return hash(str(self)) is there anything that can be done to speed up this simply code? right now it is taking well over 15 minutes to process, on a 3 Ghz machine with lots of RAM (though this is all taking CPU power, not RAM at this point.) any general advice on how to optimize large dicts would be great too thanks for your help. -- http://mail.python.org/mailman/listinfo/python-list