I have to manage a couple of dicts with huge dataset (larger than feasible with the memory on my system), it basically has a key which is a string (actually a tuple converted to a string) and a two item list as value, with one element in the list being a count related to the key. I have to at the end sort this dictionary by the count.
The platform is linux. I am planning to implement it by setting a threshold beyond which I write the data into files (3 columns: 'key count some_val' ) and later merge those files (I plan to sort the individual files by the key column and walk through the files with one pointer per file and merge them; I would add up the counts when entries from two files match by key) and sorting using the 'sort' command. Thus the bottleneck is the 'sort' command. Any suggestions, comments? By the way, is there a linux command that does the merging part? Thanks, Krishna -- http://mail.python.org/mailman/listinfo/python-list