"r.e.s." <[EMAIL PROTECTED]> writes: > I have a million-line text file with 100 characters per line, > and simply need to determine how many of the lines are distinct.
I'd generalise it by allowing the caller to pass any iterable set of items. A file handle can be iterated this way, but so can any sequence or iterable. def count_distinct(seq): """ Count the number of distinct items """ counts = dict() for item in seq: if not item in counts: counts[item] = 0 counts[item] += 1 return len(counts) >>> infile = file('foo.txt') >>> for line in file('foo.txt'): ... print line, ... abc def ghi abc ghi def xyz abc abc def >>> infile = file('foo.txt') >>> print count_distinct(infile) 5 -- \ "A man may be a fool and not know it -- but not if he is | `\ married." -- Henry L. Mencken | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list