On Wed, Jan 7, 2009 at 3:17 AM, Jim Meyering <j...@meyering.net> wrote: > "Vitali Lovich" <vlov...@gmail.com> wrote: > ... >>> because I think it's a common enough format and getting >>> more common since it's an IEC defined standard. >>> >>>> and wouldn't be better served by >>>> pre-processing the text before sort & post-processing it after as >>>> necessary? >>> >>> that's a little awkward and inefficient. >>> >>>> Supporting all the various ways the human_readable can be output is >>>> just not practical or even useful >>> >>> just ignore an optional trailing iB is all I'm suggesting. > > I agree that ignoring the valid suffixes is worthwhile. > Remember that df and du support the --block-size option, > (or equivalently, via one of the BLOCK_SIZE envvars) > so people may be running "du --block-size=MiB -s /tmp", > which produces output like this: Well then once this basic implementation is in there, add the block-size longopt to sort to support this feature if you feel it's worthwhile. My reasoning is that without sort -h, achieving that behaviour is a non-trivial shell script. With this feature, it's trivial & straightforward to sort the output from du --block-size=MiB - pass it through sed before sort to get M, pass the output of sort into sed again to put back the MiB. However, it seems like support for this is being requested, so I'll provide a reference implementation ifdeffed out so that you can determine if the additional code really makes it worthwhile.
> > 27MiB /tmp > > As such, I'd like GNU sort to handle input like that. > You could argue that the numbers might have "." or "," > thousands separators: > > $ LC_ALL=en_US du --block-size=\'1 -s /tmp > 27,905,024 /tmp > > but I don't think it's worthwhile to parse those. Well, in this case, sort -h would parse it the same as sort -n. So that is a different discussion. > >>> If it's difficult or inefficient then don't worry about it. > >> Right, but you have to deal with terminating characters and whatnot. >> I mean it's not super difficult obviously. I'm just wondering why >> that logic even belongs in sort. The rule of thumb is - the less code >> you write, the fewer bugs you'll have. > > Sure, but usability counts, too. > If we stuck mindlessly to the less-code-is-better mantra, > coreutils would look very different than it does now. > Finding the right balance isn't easy. Isn't that the beauty (i.e. frustration) of writing a software library ;). My argument was just that the additional code isn't necessarily justified because the same feature can be trivially accomplished by the user through trivial shell scripting, which is arguably much easier and cheaper to fix. For instance, if this code turns out to have a bug, that'll screw every distro that ships it (i.e. egrep on Ubuntu didn't work for a long time just due to having to maintain compatability). However, the additional bit of thinking on the users part on how to adjust the input & output to get the desired behaviour has far less costs associated with it. Nevertheless, I am bowing to what appears to be popular demand (since I have not heard any support for my initial position), and will add this when I get the time. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils