Roy Smith wrote: > In article <[EMAIL PROTECTED]>, Chris Foote <[EMAIL PROTECTED]> > wrote: > >> I have the need to store a large (10M) number of keys in a hash table, >> based on a tuple of (long_integer, integer). The standard python >> dictionary works well for small numbers of keys, but starts to >> perform badly for me inserting roughly 5M keys: >> >> # keys dictionary metakit (both using psyco) >> ------ ---------- ------- >> 1M 8.8s 22.2s >> 2M 24.0s 43.7s >> 5M 115.3s 105.4s > > Are those clock times or CPU times?
User + system CPU time. > How much memory is your process using and how much is available on your > machine? A very large amount of space is consumed, which is why I didn't give times for the 10M version which would have eaten my 1G of RAM and into swap :-) > I'm guessing a integer takes 4 bytes and a long integer takes roughly one > byte per two decimal digits. Plus a few more bytes to bundle them up into > a tuple. You've probably got something like 20 bytes per key, so 5M of > them is 100 meg just for the keys. > > To get reasonable hashing performance, the hash table needs to be maybe > half full, and figure a hash key is 4 bytes, so that's another 40 meg for > the hash table itself. > > Plus whatever the values stored in the dictionary take up. Even if you're > not storing any values (i.e., there's probably 4 bytes for a null pointer > (or reference to None), so that's another 40 meg. > > These are all vague guesses, based on what I think are probably > conservative estimates of what various objects must take up in memory, but > you see where this is going. We're already up to 180 meg. Yep, that size sounds about right (my dictionary values are all None). > I wonder if the > whole problem is that you're just paging yourself to death. A few minutes > watching your system memory performance with ps or top while your program > is running might give you some idea if this is the case. Definitely not. Thanks anyway, Chris -- http://mail.python.org/mailman/listinfo/python-list