Lothar Werzinger wrote: > I am trying to load files into a dictionary for analysis. the size of the > dictionary will grow quite large (several million entries) and as > inserting into a dictionary is roughly O(n) I figured if I loaded each > file into it's own dictionary it would speed things up. However it did > not. > > So I decided to write a small test program (attached) > > As you can see I am inserting one million entries a time into a map. I ran > the tests where I put all three million entries into one map and one where > I put one million each into it's own map. > > What I would have expected is that if I insert one million into it's own > map the time to do that would be roughly constant for each map. > Interestingly it is not. It's about the same as if I load everything into > one map. > > Oh and I have 4G of RAM and the test consumes about 40% at it's max. I > even run the test on one of our servers with 64G of RAM, so I can rule out > swapping as the issue. > > Can anyone explain this oddity? Any insight is highly appreciated.
When you are creating objects like there is no tomorrow Python's cyclic garbage collections often takes a significant amount of time. The first thing I'd try is therefore switching it off with import gc gc.disable() Peter -- http://mail.python.org/mailman/listinfo/python-list