tag 36674 notabug close 36674 stop Hello,
On Mon, Jul 15, 2019 at 11:42:01AM -0700, Marshall Lake wrote: > Even though this isn't a bug, I was asked to send the following to this > email address. (General suggestions and discussions are better suited for coreut...@gnu.org mailing list, that way the system won't open a new bug item.) > > Re: SORT Command from GNU coreutils 8.25 > > A suggestion for an additional option to the SORT command is to ignore > non-alphanumeric characters. > > As an example, in attempting to sort an index ... > > Abbott, William 259 > > sorts before: > > Abbot, William 099 > > If non-alphanumeric characters were ignored then the same two records > would sort as: > > Abbot, William 099 > Abbott, William 259 > > There's actually something else at play here: In your case, sort does ignore non-alphanumeric characters, but it ALSO ignores white space. That happens because your locale is set to some language (for example, en_US.UTF8). Using such locale makes sort ignore all non-alphanumeric chareacters, whitespace, and upper/lower cases. In essense, you are compaing "AbbottWilliam" (two 't's) to 'AbbotWilliam' (one 't') - and then the second 't' is compared to a 'w', and is determined to come first. If you force a POSIX/C locate, then all characters are considered, and the result will be as you requested. Observe the following: $ printf "%s\n" AbbottWilliam AbbotWilliam | LC_ALL=en_CA.utf8 sort AbbottWilliam AbbotWilliam $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=en_CA.utf8 sort Abbott William Abbot William $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=C sort Abbot William Abbott William $ printf "%s\n" "Abbott, William" "Abbot, William" | LC_ALL=C sort Abbot, William Abbott, William Note that 'sort' already has an option for dictionary style sorting: -d, --dictionary-order: consider only blanks and alphanumeric characters. However, locale rules take precedence over it, so effectively it only works in "C" locale: $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort Ab,,b,,ott William Abbot William $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort -d Abbot William Ab,,b,,ott William You can read past discussion about the confusion resulting from locale sorting rules here: https://debbugs.gnu.org/11621 https://debbugs.gnu.org/12783 As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. -assaf