On Wed, Jan 7, 2009 at 2:52 AM, Paul Eggert <egg...@cs.ucla.edu> wrote: > I looked at the patch > <http://launchpadlibrarian.net/20858726/sort_human_v3.diff> and have a > few comments: > > * There's no documentation, either in the .texi file or in the --help > string. That's often the hardest part to get right. I agree - that's why I'm waiting to get the implementation stabilized before trying to write it (along with the necessary test cases). That will be done, but there's no point documenting right now since the behaviour hasn't been finalized.
> > * The patch slows down "sort -n" a bit, and (worse) complicates the code > for "sort -n". There are a lot of gotos. Oh no - scary gotos. Seriously - gotos serve a place, and there is appropriate usage, especially when it comes to minimizing code duplication within a function (dont believe me? go look at the linux kernel or any significant piece of C code). In fact, gotos are actually quite fast (if you know CPUs, you'll realize it's a simple jmp instruction which does nothing to the CPU pipeline - conditional jumps are way more problomatic). Some of those gotos are unnecessary (not removing code duplication), but I think they make the code easier to read and I'm quite positive that any decent compiler will optimize away any unnecessary jumps. And code readability is far more important than that last 0.1% improvement. I challenge you to show me one benchmark where the behaviour is demonstrably slower by even a significant percentage (let's say 3%) and is not caused by the branching of which comparison to use. > Better, I think, would be > to (1) search for a trailing suffix character, and compare based on > that, and then (2) fall back on numeric comparison only if the > trailing suffix characters match. This will certainly make plain > "sort -n" faster and will result in a less-intrusive change to the > existing code; arguably it will make the new code faster in many > cases, too. Did you read my explanation above for your exact "optimization"? Please explain to me, in a somewhat detailed manner, how this improves behaviour because I'm failing to see the algorithm. Here's what I think you're suggesting: for (each character c in string a) if (c is not a number) break for (each character d in string b) if (c is not a number) break if (suffix starting at c != suffix starting at d) return which one is greater perform the regular sort -n comparison between a & b You must be thinking that that last step is somehow magical and free, cause I see it as once again looping over the string. You may be thinking to yourself - yeah, but there's no expensive comparison between the two strings. Ok - I'll grant you that. So you've sped up the best case where the suffixes don't match (more on this later). Now let's look at the worst case - the suffixes match. Thus you have to scan over the string again but this time performing your numeric comparison. So you've made your worst case slower. I have a feeling that this worst case is also likely to be the average case. Also, let's take it that you have managed to speed-up the best-case. But have you really? What have you saved - the cost of a subtraction n times (n is the length of the number) over a span of x comparisons. Do you honestly believe that n * x will be sufficiently big in any reasonable test case that this saving will actually be realized? x by the way will be approximately m log m (where m is the number of strings). So your cost savings over the course of your call to sort will be n * m log m, for m strings of length n. Let's take a reasonable n = 3 (after all - higher n on average means you will be using a different suffix anyways). Let's saying you are sorting a million strings. So you've saved about 3 000 000 log 3 000 000 = 19 431 363 comparisons. Let's be generous here and say that subtraction takes 20 clock cycles (I don't feel like digging up my FPGA work to figure out how many clock cycles my implementation took - let alone the far better ones used in modern processors). On a modern 2.0 GHz machine, that'll be about 40 ns. So for a million strings, you've managed to save a whopping 777254520 nanoseconds. Wait a minute - that's 0.7 seconds. So every million strings - your optimization, in the best case, will shave off 0.7 seconds. Excellent. Oh - and m has a much larger impact, so sorting more strings will be more of a factor than sorting longer strings. Now repeat the above analysis for the worst case, and I'm quite confident you'll see a much larger hit (although probably still not significant at all) in the worst case (suffixes equal). Hopefully I didn't emabarass myself with a glaring mistake - it's really late. > > * Please don't take the address of an often-used local variable; this > hurts optimization, since it often inhibits optimizing that variable > into a register. Could you please provide me with context about where this is happening (I can't find it in the code). A line number in the patch would be great. > > * I suggest ignoring any suffix after the 'k' or 'M' or 'G' or whatever; > it's simpler. And extremely incorrect - please read the discussion above. Essentially, you then run into the problem of malformed input - 2Klingons would be treated the same as 2K Klingongs. Granted, this is a contrived example, but I hope you see my point. > > * I also suggest allowing only the upper-case version (e.g., 'P', not > 'p'), as this will allow later support for ISO prefixes like 'p' (for > "pico") better. (It's OK to allow both 'k' and 'K' to mean kilo, > though.) Agreed, although in this context they are numeric suffixes (pico is the unit prefix, but units play no role in this discussion). The problem though then becomes that I suddenly have to worry about which suffixes can be both and which must be one or the either. Additionally, you've got problems like mu for micro, which I'm not even sure can be represented in ASCII. Also, h & da are hardly used I think (& even better, da breaks the pattern of 1 character suffixes). So supporting the full set seems questionable. > > * exponent_order should be a simple indexed lookup like "table[a]"; > there shouldn't be a need for a loop checking for "inverse_table[i] == > a". Is the memory-speed tradeoff really worth it? The table isn't that big to begin with. So you're trading off 256-bytes for (in the best case) a savings of, like what, 5 subtractions per suffix comparison? I'd probably concede your point on this tomorrow after a nights sleep. > > Hope this helps. > _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils