bug#76290: "sort -u" vs "sort -h -u": possible bug

Rainer Canavan Wed, 19 Feb 2025 11:16:44 -0800

On 19.02.25 18:14, Bernhard Voelker wrote:

On 2/18/25 7:45 PM, Rupert Gallagher via GNU coreutils Bug Reports wrote:

By comparison, human (-h) and numeric (-n) sort cause data loss:


not really.  That's the difference between
a)

"I have a list containing numbers; I merely care about numbers andwant to get a unique, sorted list of them."

  ('sort -h -u')

and
b)

"I have a list containing numbers; I want to have it sorted bynumbers, and then throw away duplicates."

  ('sort -h | uniq')

The point is: in case a), the numerical value of each non-number entryis Zero.

I have no issue with the way 'sort -u' is currently working, but the manpage isn't clear at all about the fact that 'sort -h -u' and 'sort -h |uniq' behave differently.


Specifically, the explanation for -u

-u, --unique

with -c, check for strict ordering; without -c, outputonly the first of an equal run

does not provide any explanation what 'equal' or 'run' may mean. Maybeadd something like "where equality is assessed only based on the keysand rules used to sort the output".



Rainer

bug#76290: "sort -u" vs "sort -h -u": possible bug

Reply via email to