Paddy wrote: > If the log has a lot of repeated lines in its original state then > running uniq twice, once up front to reduce what needs to be sorted, > might be quicker?
Having the uniq and sort steps integrated in a single piece of software allows for the most optimization opportunities. The sort utility, under -u, could squash duplicate lines on the input side /and/ the output side. > uniq log_file | sort| uniq | wc -l Now you have two more pipeline elements, two more tasks running, and four more copies of the data being made as it travels through two extra pipes in the kernel. Or, only two more copies if you are lucky enough to have a "zero copy" pipe implementation whcih allows data to go from the writer's buffer directly to the reader's one without intermediate kernel buffering. -- http://mail.python.org/mailman/listinfo/python-list