Seongsu Lee: >I have a dictionary with million keys. Each value in the dictionary has a list >with up to thousand integers.<
Let's say each integer can be represented with 32 bits (if there are less numbers then a 3-byte representation may suffice, but this makes things more complex), that is 2^2 bytes. Let's say there are 2^20 keys each one associated to 2^10 values. So to represent the values you need 2^32 bytes. It means 4 GB, so I don't think Python suffices to store them in RAM, because a Python int object requires quite more than 4 bytes (only represented inside an array.array it may need just 4 bytes). So if you can use 128 MB RAM to store such data structure you need to store data on HD too. You probably can use a lower-level language. On disk you can keep the reverse index, represented as an array of records/structs, each of such structures keep two 32-bit numbers (so such array is 8 GB). Such index is sorted according to the first element of the struct. The first number is the value of the original dictionary and the second nuber is its key. Inside the RAM you can keep another sorted array that "summarizes" your whole data. When you need a number you can do a binary search on the array in RAM, such array gives you the position where you can read (with a seek) a little part of the file (512 bytes may suffice), to perform a little binary search (if the block is very little a linear scan suffices) on it too to find the number you need. Note that the summarizing data structure in RAM may be represented with just a Python dict too, so in the end you can use Python to solve this problem. You may need a lower-level language to create the 8 GB file on disk (or create it with Python, but it may take lot of time. You may sort it with the sort unix command). This isn't a complete solution, but I think it may work. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list