tag 19142 notabug close 19142 thanks Roland Sieker wrote: > I have noticed that sort seems to have problems when the LANG environment > variable is set with language and country.
Sort is definitely affected by LANG because LANG sets LC_COLLATE which controls the collation sequence. Different locales have different collating sequences. I don't like that the english locales such as my own country's en_US.UTF-8 and others like en_GB.UTF-8 don't sort "correctly" as far as I am concerned but I can only accept it. Sort order is actually a libc function and affects much more than sort. It also affects ls and the shell and basically everything on the system that sorts. > It sorts OK like this, with LANG just the language.encoding: > ( setenv LANG en.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort ) > a > a > b Are you sure "en.UTF-8" is a valid locale? It doesn't look like it to me. I think that is an invalid locale and therefore libc is falling back to the C/POSIX locale. > But not with LANG as language_country.encoding: > ( setenv LANG en_GB.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort ) Here "en_GB.UTF-8" is a valid domain and en_GB.UTF-8 uses dictionary sort ordering. Dictionary order folds case and ignores punctuation. Try using the newish sort --debug option. It will help debug problems such as this. $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en_US.UTF-8 sort --debug sort: using ‘en_US.UTF-8’ sorting rules ... $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en.UTF-8 sort --debug sort: using simple byte comparison ... See also the FAQ entry: https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 Bob