I have compiled my own 'sort' which deliberately ignores locale, (more precisely deliberately uses the 'C' locale by default) for exactly this reason. I don't want to screw with an environment variable that affects dozens of things just to get sort to work predictably.
A while ago I offered a patch to put a -locale argument in sort (so I could alias it to 'C' locale) but it was rejected by the maintainers. So, screw it, I made my own. Some political committee or other has banned simple, efficient, predictable semantics, so it won't be in any distribution any time soon. But if you value simple efficient predictable semantics, I suggest you do the same. Bear On 04/17/2015 09:26 AM, Eric Blake wrote: > On 04/17/2015 10:10 AM, Peng Yu wrote: >> Hi, I got the following results when I call sort with -t /. It seems >> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not >> using sort correctly? > > Your assumption is correct - you are using sort incorrectly, by failing > to take locales into account, and by failing to limit the amount of data > being compared to single field widths. > >> >> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4 >> a >> a! >> a/1.txt >> aB >> ab > > sort --debug is your friend: > > $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1 -k 2 -k 3 -k 4 > sort: using ‘en_US.UTF-8’ sorting rules > a > _ > ^ no match for key > ^ no match for key > ^ no match for key > _ > a! > __ > ^ no match for key > ^ no match for key > ^ no match for key > __ > a/1.txt > _______ > _____ > ^ no match for key > ^ no match for key > _______ > ab > __ > ^ no match for key > ^ no match for key > ^ no match for key > __ > aB > __ > ^ no match for key > ^ no match for key > ^ no match for key > __ > > > As shown in the debug trace, the line 'a!' sorts prior to the line > 'a!1.txt' because your first sort key is the entire line, and in the > locale you are using (where both '!' and '/', and also '.', are ignored > in collation orders), the collation string "a" really does come before > "a1txt". > > What you REALLY want is to limit your sorting to a single field at a > time (-k1,1 rather than -k), as in: > > $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1,1 -k 2,2 > sort: using ‘en_US.UTF-8’ sorting rules > a > _ > ^ no match for key > _ > a/1.txt > _ > _____ > _______ > a! > __ > ^ no match for key > __ > ab > __ > ^ no match for key > __ > aB > __ > ^ no match for key > __ > > > Or additionally, to limit your sorting to a locale that does not discard > punctuation as unimportant, as in: > > $ printf '%s\n' a 'a!' ab aB a/1.txt | LC_ALL=C sort --debug -t / -k 1,1 > -k 2 > sort: using simple byte comparison > a > _ > ^ no match for key > _ > a/1.txt > _ > _____ > _______ > a! > __ > ^ no match for key > __ > aB > __ > ^ no match for key > __ > ab > __ > ^ no match for key > __ > >
signature.asc
Description: OpenPGP digital signature