forrest yang a écrit :
i try to load a big file into a dict, which is about 9,000,000 lines,
something like
1 2 3 4
2 2 3 4
3 4 5 6

How "like" is it ?-)

code
for line in open(file)
   arr=line.strip().split('\t')
   dict[arr[0]]=arr

but, the dict is really slow as i load more data into the memory,

Looks like your system is starting to swap. Use 'top' or any other system monitor to check it out.

by
the way the mac i use have 16G memory.
is this cased by the low performace for dict to extend memory

dicts are Python's central data type (objects are based on dicts, all non-local namespaces are based on dicts, etc), so you can safely assume they are highly optimized.

or
something other reason.

FWIW, a very loose (and partially wrong, cf below) estimation based on wild guesses: assuming an average size of 512 bytes per object (remember that Python doesn't have 'primitive' types), the above would use =~ 22G.

Hopefully, CPython does some caching for some values of some immutable types (specifically, small ints and strings that respect the grammar for Python identifiers), so depending on your real data, you might need a bit less RAM. Also, the 512 bytes per object is really more of a wild guess than anything else (but given the internal structure of a CPython object, I think it's about that order - please someone correct me if I'm plain wrong).

Anyway: I'm afraid the problem has more to do with your design than with your code or Python's dict implementation itself.

is there any one can provide a better solution

Use a DBMS. They are designed - and highly optimised - for fast lookup over huge data sets.

My 2 cents.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to