"Tim Chase" <[EMAIL PROTECTED]> wrote ... > 2) use a python set: > > s = set() > for line in open("file.in"): > s.add(line.strip()) > return len(s) > > 3) compact #2: > > return len(set([line.strip() for line in file("file.in")])) > > or, if stripping the lines isn't a concern, it can just be > > return len(set(file("file.in"))) > > The logic in the set keeps track of ensuring that no > duplicates get entered. > > Depending on how many results you *expect*, this could > become cumbersome, as you have to have every unique line in > memory. A stream-oriented solution can be kinder on system > resources, but would require that the input be sorted first.
Thank you (and all the others who responded!) -- set() does the trick, reducing the job to about a minute. I may play later with the other alternatives people mentionsed (dict(), hash(),...), just out of curiosity. I take your point about the "expected number", which in my case was around 0-10 (as it turned out, there were no dups). BTW, the first thing I tried was Fredrik Lundh's program: def number_distinct(fn): return len(set(s.strip() for s in open(fn))) which worked without the square brackets. Interesting that omitting them doesn't seem to matter. -- http://mail.python.org/mailman/listinfo/python-list