See end for solution. > >> (3) Are you sure you need all eight-million-plus items in the cache > >> all at once? > > > > Yes. > > I remain skeptical, but what do I know, I don't even know what you're > doing with the data once you have it :-)
It's OK, I'd be skeptical too. ;) > $ cat /proc/cpuinfo > processor : 0 > vendor_id : AuthenticAMD > cpu family : 15 > model : 107 > model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ > stepping : 1 > cpu MHz : 1000.000 > cache size : 512 KB > ... > processor : 1 > vendor_id : AuthenticAMD > cpu family : 15 > model : 107 > model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ > stepping : 1 > cpu MHz : 1000.000 > cache size : 512 KB # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 stepping : 10 cpu MHz : 2009.305 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 3981.31 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 stepping : 10 cpu MHz : 2009.305 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 4014.08 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp As for the solution, after trying a half-dozen different integer hashing functions and hash table sizes (the brute force approach), on a total whim I switched to a model with two dictionary tiers and got whole orders of magnitude better performance. The tiering is, for a given key of type long: id2name[key >> 40][key & 0x10000000000] = name Much, much better. A few minutes versus hours this way. I suspect it could be brought down to seconds with a third level of tiers but this is no longer posing the biggest bottleneck... ;) Thanks! -- http://mail.python.org/mailman/listinfo/python-list