When I run uniq from coreutils 5.0 with LANG=en_GB.UTF-8 on a glibc 2.3.2
system on a file which (I think) is not valid UTF-8, I get a confusing
result: the two lines in the file are identical, but uniq prints them
both, and returns an exit code of 0. If I run

LANG=C uniq <file>

I get the expected single line of output. What I expect when I run with
LANG=en_GB.UTF-8 is either for uniq to return an error (because the file
is not valid text), or to print one single line (if it's being lenient).

The only way I might be wrong is if the file can be interpreted as a UTF-8
file with two non-identical lines, but I don't think it can.

I attach the relevant file, and display it below (the \200 is a literal
top-bit-set byte, value octal 0200). The file ends with a linefeed, in
case you're wondering!

ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506
ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506

-- 
http://www.mupsych.org/~rrt/ | The only person worth beating is yourself
ushort f(char,double,char,double):('a',0.2,'€',0.4)->65506
ushort f(char,double,char,double):('a',0.2,'€',0.4)->65506
_______________________________________________
Bug-coreutils mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to