Re: uniq Bug

Eric Blake Tue, 27 Jun 2006 17:34:35 -0700

> Not sure if this has already been discovered, but I found a problem with 
> uniq. If I sat down and looked a the code, I could probably see how to 
> fix it. It seems to always occur with very large unsorted streams (files).
> 
> Below are the commands I ran to exploit the bug (which I originally 
> thought was my error). Sorting the stream before removing duplicate 
> lines is inconsistent with just removing duplicate lines:


Thanks for the report.  However, uniq only works on sorted streams.  By
definition, uniq only looks at consecutive lines, to see if they are identical.
If the file is not sorted, then the same line might appear twice.  And
changing this would make slow uniq down (either requiring more
memory or more time to keep a list of all previously seen unique lines),
not to mention violating POSIX.

> Note that srv_inodes.txt as generated is about 70 thousand inode 
> numbers. I've attached this file.

That was a little presumptuous of you - this is a public mailing list,
and you just blasted 150k of data that means very little to a large
number of recipients.  Usually it is better to reduce your test case
to something that fits in the body of your message.

-- 
Eric Blake


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: uniq Bug

Reply via email to