On 2/18/25 7:45 PM, Rupert Gallagher via GNU coreutils Bug Reports wrote:
By comparison, human (-h) and numeric (-n) sort cause data loss:

not really.  That's the difference between
a)
  "I have a list containing numbers; I merely care about numbers and want to get a 
unique, sorted list of them."
  ('sort -h -u')

and
b)
  "I have a list containing numbers; I want to have it sorted by numbers, and then 
throw away duplicates."
  ('sort -h | uniq')

The point is: in case a), the numerical value of each non-number entry is Zero.

Consider the following:

  $ printf "%s\n" 0 1 X-1 Ab2 3 ma | LC_ALL=C sort -nu
  0
  1
  3

Here, the entries 0, "X-1", "Ab2" and "ma" all have the numerical value 0.
That's why the first Zero is output.

Now let's remove the literal/numerical 0 from the input:

  $ printf "%s\n"  1 X-1 Ab2 3 ma | LC_ALL=C sort -nu
  X-1
  1
  3

Now, the first entry which represents numerically 0 is "X-1".
Now even let's put the 0 back into the input, but at the end:

  $ printf "%s\n"  1 X-1 Ab2 3 ma 0 | LC_ALL=C sort -nu
  X-1
  1
  3

Still, sort(1) outputs the first entry which has a numerical value of Zero: 
"X-1".

Have a nice day,
Berny




Reply via email to