uniq prints invalid unique lines multiple times

Reuben Thomas Mon, 23 Feb 2004 13:06:13 -0800

When I run uniq from coreutils 5.0 with LANG=en_GB.UTF-8 on a glibc 2.3.2
system on a file which (I think) is not valid UTF-8, I get a confusing
result: the two lines in the file are identical, but uniq prints them
both, and returns an exit code of 0. If I run


LANG=C uniq <file>

I get the expected single line of output. What I expect when I run with
LANG=en_GB.UTF-8 is either for uniq to return an error (because the file
is not valid text), or to print one single line (if it's being lenient).

The only way I might be wrong is if the file can be interpreted as a UTF-8
file with two non-identical lines, but I don't think it can.

I attach the relevant file, and display it below (the \200 is a literal
top-bit-set byte, value octal 0200). The file ends with a linefeed, in
case you're wondering!

ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506
ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506

-- 
http://www.mupsych.org/~rrt/ | The only person worth beating is yourself

ushort f(char,double,char,double):('a',0.2,'€',0.4)->65506
ushort f(char,double,char,double):('a',0.2,'€',0.4)->65506

_______________________________________________
Bug-coreutils mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-coreutils

uniq prints invalid unique lines multiple times

Reply via email to