On 19.02.25 18:14, Bernhard Voelker wrote:
On 2/18/25 7:45 PM, Rupert Gallagher via GNU coreutils Bug Reports wrote:
By comparison, human (-h) and numeric (-n) sort cause data loss:
not really. That's the difference between
a)
"I have a list containing numbers; I merely care about numbers and
want to get a unique, sorted list of them."
('sort -h -u')
and
b)
"I have a list containing numbers; I want to have it sorted by
numbers, and then throw away duplicates."
('sort -h | uniq')
The point is: in case a), the numerical value of each non-number entry
is Zero.
I have no issue with the way 'sort -u' is currently working, but the man
page isn't clear at all about the fact that 'sort -h -u' and 'sort -h |
uniq' behave differently.
Specifically, the explanation for -u
-u, --unique
with -c, check for strict ordering; without -c, output
only the first of an equal run
does not provide any explanation what 'equal' or 'run' may mean. Maybe
add something like "where equality is assessed only based on the keys
and rules used to sort the output".
Rainer