tag 35636 notabug thanks On 5/8/19 3:35 AM, Michele Liberi wrote: > I verified the following bug is there in: > > - sort (GNU coreutils) 8.21 > - sort (GNU coreutils) 8.22 > - sort (GNU coreutils) 8.23 > > *Input file:* > # cat sort.in > 1|a|x > 2|b|x > 3|aa|x > 4|bb|x > 5|c|x > > > *shell command and output:* > # sort -t'|' -k2 <sort.in > 3|aa|x > 1|a|x > 4|bb|x > 2|b|x > 5|c|x
Let's use --debug to see what sort really did: $ sort --debug -t'|' -k2 <sort.in sort: using ‘en_US.UTF-8’ sorting rules 3|aa|x ____ ______ 1|a|x ___ _____ 4|bb|x ____ ______ 2|b|x ___ _____ 5|c|x ___ _____ Since you did not specify an ending field, you are comparing the string "aa|x" with "a|x", and the string "a|x" with "bb|x"; in the en_US.UTF-8 locale, punctuation is ignored on the first-order pass through strcoll(), which means you are effectively comparing "aax" with "ax" with "bbx", and the sort is correct; but even in a locale that does not ignore punctuation: $ LC_ALL=C sort --debug -t'|' -k2 <sort.in sort: using simple byte comparison 3|aa|x ____ ______ 1|a|x ___ _____ 4|bb|x ____ ______ 2|b|x ___ _____ 5|c|x ___ _____ the sort is still correct, since ASCII '|' sorts after ASCII 'a'. Your real problem is that you are sorting on too much data; you need to try again with the key limited to exactly the second field: $ sort --debug -t'|' -k2,2 <sort.in sort: using ‘en_US.UTF-8’ sorting rules 1|a|x _ _____ 3|aa|x __ ______ 2|b|x _ _____ 4|bb|x __ ______ 5|c|x _ _____ where now sort can see that "a" is a prefix of "aa" because it is no longer bleeding on to the rest of the line. > > *I expected that key "a" to come before key "aa" and key "b" to come before > key "bb".* Your expectations are at odds with your incomplete command line. sort is behaving as required; therefore, I'm closing this as not a bug. But feel free to reply if you have further questions. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature