Matimus, your suggestions are all good. Try-except is slower than: if x in adict: ... else: ... A defaultdict is generally faster (there are some conditions when it's not faster, but they aren't much common. I think it's when the ratio of duplicates is really low), creating just a tuple instead of a class helps a lot, and when the CPU/OS allow it, Psyco too may help some here.
If the resulting speed isn't enough yet, consider that Python dicts are quite fast, so you may need lot of care to write D/C/C++/Clisp code that's faster for this problem. I also suspect that when they become very large, Python dicts lose some of their efficiency. If this is true, you may switch to a new dictionary every chunk of file, and then merge the dicts at the end. I don't actually know if this may speed up your Python code even more (if you experiment this, I'd like to know if it's false). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list