1. Why is two minutes to insert 5M keys "bad" for you? What would be "good"? What would be good/bad look-up times? Have you measured the typical look-up time? How often is the dict creation required to be done? How often does the data change? Is multi-user access required for (a) look-up (b) updating? Have you considered loading the dict from a pickle? 2. Assuming your code that is creating the dict looks in essence like this: adict = {} for k, v in some_iterable: adict[k] = v then any non-linear behaviour can only be in the actual CPython insertion code. Psyco can't help you there. Psyco *may* help with the linear part, *if* you have enough memory. What are the corresponding times without Psyco? In any case, if your code isn't (conceptually) that simple, then try cutting away the cruft and measuring again. 3. Which version of Python? What OS? OK, psyco -> Intel x86, but what chip exactly? How much free memory? 4. Consider printing time-so-far results, say every 100K keys. Multiple step-ups might indicate dict resizings. A dog-leg probably means running out of memory. Why "roughly" 5M keys??? 5. How large are your long_integers? 6. What is the nature of the value associated with each key? 7. Have you experimented with key = a * 2 ** 32 + b instead of key = (a, b)?
HTH, John -- http://mail.python.org/mailman/listinfo/python-list