According to Voelker, Bernhard on 3/2/2010 1:34 AM: > I understand that the sort order depends on the locale, i.e. LC_ALL, > but this doesn't explain the differences I get on Solaris 5.10, SLES 10.1, > and Cygwin (given that sort didn't change about this point in the past).
The difference is that all three use different locale installations. > > # === Solaris SunOS 5.10, sort 6.10 === > $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=C sort > ru.unix /h > ru.unix.ftn /h > ru.unix.prog /h > $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=POSIX sort > ru.unix /h > ru.unix.ftn /h > ru.unix.prog /h C and POSIX are strictly identical, on all machines. If they ever behave differently from one another, on the same machine, or when comparing two machines, then you have found a bug and should report it to that vendor. > $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort > ru.unix /h > ru.unix.ftn /h > ru.unix.prog /h That just means that Solaris' rules for en_US don't ignore punctuation. You can use locale(1) to learn more about the collation rules that will be selected when you enable that locale. > # === SLES 10.1, kernel 2.6.16.60-0.23-smp, sort 5.93 === > $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort > ru.unix.ftn /h > ru.unix /h > ru.unix.prog /h Yep, glibc's locale installation ignores punctuation for en_US. And glibc's locale installation is probably the most complete one out there. > $ sort --version > sort (GNU coreutils) 5.93 Time to consider upgrading - the latest stable version is 8.4, and there have been some bugs fixed in sort in the meantime. > # === Cygwin on XPSP3, CYGWIN_NT-5.1 1.7.1(0.218/5/3), sort 7.0 === > $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort > ru.unix /h > ru.unix.ftn /h > ru.unix.prog /h Yep, cygwin 1.7.1 silently treats all LC_COLLATE in the C locale (basically, no one had implemented the internals to convert the windows notion of collation over to the POSIX api); it will improve for cygwin 1.7.2. But cygwin is still different than glibc; it only supports locales known to windows, rather than the glibc approach of letting you install your own locales to a specific directory. > It seems that sort doesn't depend on LC_ALL on Solaris and Cygwin, > but it does on Linux. Besides LC_ALL, what does the sort order depend > on? Build settings? LC_ALL takes precedence. But if LC_ALL is unset, then it is up to LC_COLLATE; and if that is unset, then LC_LANG; and if that is unset, then it is system-specific. -- Don't work too hard, make some time for fun as well! Eric Blake [email protected]
signature.asc
Description: OpenPGP digital signature
