Re: dict is really slow for big truck

Bruno Desthuilliers Thu, 30 Apr 2009 00:46:06 -0700

forrest yang a écrit :

i try to load a big file into a dict, which is about 9,000,000 lines,
something like
1 2 3 4
2 2 3 4
3 4 5 6


How "like" is it ?-)

code
for line in open(file)
   arr=line.strip().split('\t')
   dict[arr[0]]=arr

but, the dict is really slow as i load more data into the memory,

Looks like your system is starting to swap. Use 'top' or any othersystem monitor to check it out.

by
the way the mac i use have 16G memory.
is this cased by the low performace for dict to extend memory

dicts are Python's central data type (objects are based on dicts, allnon-local namespaces are based on dicts, etc), so you can safely assumethey are highly optimized.

or
something other reason.

FWIW, a very loose (and partially wrong, cf below) estimation based onwild guesses: assuming an average size of 512 bytes per object (rememberthat Python doesn't have 'primitive' types), the above would use =~ 22G.

Hopefully, CPython does some caching for some values of some immutabletypes (specifically, small ints and strings that respect the grammar forPython identifiers), so depending on your real data, you might need abit less RAM. Also, the 512 bytes per object is really more of a wildguess than anything else (but given the internal structure of a CPythonobject, I think it's about that order - please someone correct me if I'mplain wrong).

Anyway: I'm afraid the problem has more to do with your design than withyour code or Python's dict implementation itself.

is there any one can provide a better solution

Use a DBMS. They are designed - and highly optimised - for fast lookupover huge data sets.


My 2 cents.
--
http://mail.python.org/mailman/listinfo/python-list

Re: dict is really slow for big truck

Reply via email to