Re: Support in sort for human-readable numbers

Vitali Lovich Wed, 07 Jan 2009 00:33:59 -0800

On Wed, Jan 7, 2009 at 3:17 AM, Jim Meyering <j...@meyering.net> wrote:
> "Vitali Lovich" <vlov...@gmail.com> wrote:
> ...
>>> because I think it's a common enough format and getting
>>> more common since it's an IEC defined standard.
>>>
>>>> and wouldn't be better served by
>>>> pre-processing the text before sort & post-processing it after as
>>>> necessary?
>>>
>>> that's a little awkward and inefficient.
>>>
>>>> Supporting all the various ways the human_readable can be output is
>>>> just not practical or even useful
>>>
>>> just ignore an optional trailing iB is all I'm suggesting.
>
> I agree that ignoring the valid suffixes is worthwhile.
> Remember that df and du support the --block-size option,
> (or equivalently, via one of the BLOCK_SIZE envvars)
> so people may be running "du --block-size=MiB -s /tmp",
> which produces output like this:
Well then once this basic implementation is in there, add the
block-size longopt to sort to support this feature if you feel it's
worthwhile.  My reasoning is that without sort -h, achieving that
behaviour is a non-trivial shell script.  With this feature, it's
trivial & straightforward to sort the output from du --block-size=MiB
- pass it through sed before sort to get M, pass the output of sort
into sed again to put back the MiB.  However, it seems like support
for this is being requested, so I'll provide a reference
implementation ifdeffed out so that you can determine if the
additional code really makes it worthwhile.


>
> 27MiB   /tmp
>
> As such, I'd like GNU sort to handle input like that.
> You could argue that the numbers might have "." or ","
> thousands separators:
>
>  $ LC_ALL=en_US du --block-size=\'1 -s /tmp
>  27,905,024      /tmp
>
> but I don't think it's worthwhile to parse those.
Well, in this case, sort -h would parse it the same as sort -n.  So
that is a different discussion.
>
>>> If it's difficult or inefficient then don't worry about it.
>
>> Right, but you have to deal with terminating characters and whatnot.
>> I mean it's not super difficult obviously.  I'm just wondering why
>> that logic even belongs in sort.  The rule of thumb is - the less code
>> you write, the fewer bugs you'll have.
>
> Sure, but usability counts, too.
> If we stuck mindlessly to the less-code-is-better mantra,
> coreutils would look very different than it does now.
> Finding the right balance isn't easy.
Isn't that the beauty (i.e. frustration) of writing a software library
;).  My argument was just that the additional code isn't necessarily
justified because the same feature can be trivially accomplished by
the user through trivial shell scripting, which is arguably much
easier and cheaper to fix.  For instance, if this code turns out to
have a bug, that'll screw every distro that ships it (i.e. egrep on
Ubuntu didn't work for a long time just due to having to maintain
compatability).  However, the additional bit of thinking on the users
part on how to adjust the input & output to get the desired behaviour
has far less costs associated with it.  Nevertheless, I am bowing to
what appears to be popular demand (since I have not heard any support
for my initial position), and will add this when I get the time.


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: Support in sort for human-readable numbers

Reply via email to