forcemerge 11967 11968 tag 11967 notabug thanks On 07/17/2012 12:17 PM, Jaime Gaspar wrote: > I think that there is a bug in "uniq" (version 8.13).
Is this your distro's build? However, I repeated your claim with the latest coreutils.git (post-8.17)., so this is not likely to be a bug in a distro-specific multibyte patch. > > The file "bug.txt" attached consists of two lines: > - the first one containing a character that > looks like a "v" and a line break; > - the second one containing a character that > looks like a upside down "v" and a line break. > In hex: > > E2 88 A8 0A > E2 88 A7 0A Those glyphs that you describe line up with Unicode characters. I bet you are using a locale with UTF-8 character encoding. > > When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so > "uniq" thinks that the two lines are equal, but they are not. I can reproduce your symptoms, but only when I fudge my locale: $ LC_ALL=C uniq ../bug.txt ∨ ∧ $ LC_ALL=en_US.UTF-8 uniq ../bug.txt ∨ $ Remember, 'uniq' is required by POSIX to use the same line comparison techniques as 'sort'; and 'sort' is required to use strcoll() (not strcmp) to compare lines. And in your particular choice of locale, strcoll() happens to state that '∨' and '∧' collate identically; hence uniq is correct in stating that you have a duplicated line according to your current locale. $ LC_ALL=en_US.UTF-8 sort ../bug.txt -u --debug sort: using ‘en_US.UTF-8’ sorting rules ∨ _ $ So I'm closing this as not a bug, along with a final pointer to our FAQ: https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021 -- Eric Blake ebl...@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature