On Wed, Mar 10, 2010 at 11:47 AM, Krishna K <krishna.k.0...@gmail.com> wrote: > > > On Fri, Feb 19, 2010 at 11:27 PM, Jonathan Gardner > <jgard...@jonathangardner.net> wrote: >> >> On Fri, Feb 19, 2010 at 10:36 PM, krishna <krishna.k.0...@gmail.com> >> wrote: >> > I have to manage a couple of dicts with huge dataset (larger than >> > feasible with the memory on my system), it basically has a key which >> > is a string (actually a tuple converted to a string) and a two item >> > list as value, with one element in the list being a count related to >> > the key. I have to at the end sort this dictionary by the count. >> > >> > The platform is linux. I am planning to implement it by setting a >> > threshold beyond which I write the data into files (3 columns: 'key >> > count some_val' ) and later merge those files (I plan to sort the >> > individual files by the key column and walk through the files with one >> > pointer per file and merge them; I would add up the counts when >> > entries from two files match by key) and sorting using the 'sort' >> > command. Thus the bottleneck is the 'sort' command. >> > >> > Any suggestions, comments? >> > >> >> You should be using BDBs or even something like PostgreSQL. The >> indexes there will give you the scalability you need. I doubt you will >> be able to write anything that will select, update, insert or delete >> data better than what BDBs and PostgreSQL can give you. >> >> -- >> Jonathan Gardner >> jgard...@jonathangardner.net > > Thank you. I tried BDB, it seems to get very very slow as you scale. > > Thank you, > Krishna
Have you tried any of the big key-value store systems, like couchdb etc? Geremy Condra -- http://mail.python.org/mailman/listinfo/python-list