On 20 Feb, 06:36, krishna <krishna.k.0...@gmail.com> wrote: > I have to manage a couple of dicts with huge dataset (larger than > feasible with the memory on my system), it basically has a key which > is a string (actually a tuple converted to a string) and a two item > list as value, with one element in the list being a count related to > the key. I have to at the end sort this dictionary by the count. > > The platform is linux. I am planning to implement it by setting a > threshold beyond which I write the data into files (3 columns: 'key > count some_val' ) and later merge those files (I plan to sort the > individual files by the key column and walk through the files with one > pointer per file and merge them; I would add up the counts when > entries from two files match by key) and sorting using the 'sort' > command. Thus the bottleneck is the 'sort' command. > > Any suggestions, comments? > > By the way, is there a linux command that does the merging part? > > Thanks, > Krishna
Have you looked here? http://docs.python.org/library/persistence.html -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list