Re: Sorting in huge files

Steven Bethard Tue, 07 Dec 2004 13:35:35 -0800

Paul wrote:

I expect a few repeats for most of the keys, and that s actually part
of what I want to figure out in the end. (Said loosely, I want to group
all the data entries having "similar" keys. For this I need to sort the
keys first (data entries having _same_ key), and then figure out which
keys are "similar").

If this is really your final goal, you may not want to sort. Consider code like the following:

>>> entries = [('a', '4'),
...            ('x', '7'),
...            ('a', '2'),
...            ('b', '7'),
...            ('x', '4')]
>>> counts = {}
>>> for entry in entries:
...     key = entry[0]
...     counts.setdefault(key, []).append(entry)
...
>>> for key in counts:
...     print key, counts[key]
...
a [('a', '4'), ('a', '2')]
x [('x', '7'), ('x', '4')]
b [('b', '7')]

I've grouped all entries with the same key together using a dict object and without the need for any sorting. If you had a good definition of 'similar', you could perhaps map all 'similar' keys to the same value in the dict.

If you really do need to sort, Python 2.4 provides a very nice way to sort by a particular key:

>>> import operator
>>> entries = [('a', '4'),
...            ('x', '7'),
...            ('a', '2'),
...            ('b', '7'),
...            ('x', '4')]
>>> entries.sort(key=operator.itemgetter(1))
>>> entries
[('a', '2'), ('a', '4'), ('x', '4'), ('x', '7'), ('b', '7')]

Here, I've sorted the entries by the second item in each tuple. If you go this route, you should also look at itertools.groupby:

>>> import itertools
>>> entries = [('a', '4'),
...            ('x', '7'),
...            ('a', '2'),
...            ('b', '7'),
...            ('x', '4')]
>>> entries.sort(key=operator.itemgetter(1))
>>> for key, values in itertools.groupby(entries, operator.itemgetter(1)):
...     print key, list(values)
...
2 [('a', '2')]
4 [('a', '4'), ('x', '4')]
7 [('x', '7'), ('b', '7')]

The groupby basically does the sort of grouping of a sorted list that I think you had in mind...

Steve
--
http://mail.python.org/mailman/listinfo/python-list

Re: Sorting in huge files

Reply via email to