Tim Rowe <digi...@gmail.com> writes:
> We were told in the original question: more than 15 million records,
> and it won't all fit into memory. So your observation is pertinent.

That is not terribly many records by today's standards.  The knee-jerk
approach is to sort them externally, then make a linear pass skipping
the duplicates.  Is the exercise to write an external sort in Python?
It's worth doing if you've never done it before.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to