Oh, and if my prior e-mail seems a bit abrupt, I apologize. I don't mean to be rude. It is getting late, and I probably should've waited until morning to compose my reply.
On Wed, Jan 7, 2009 at 4:21 AM, Vitali Lovich <vlov...@gmail.com> wrote: > On Wed, Jan 7, 2009 at 2:52 AM, Paul Eggert <egg...@cs.ucla.edu> wrote: >> I looked at the patch >> <http://launchpadlibrarian.net/20858726/sort_human_v3.diff> and have a >> few comments: >> >> * There's no documentation, either in the .texi file or in the --help >> string. That's often the hardest part to get right. > I agree - that's why I'm waiting to get the implementation stabilized > before trying to write it (along with the necessary test cases). That > will be done, but there's no point documenting right now since the > behaviour hasn't been finalized. > >> >> * The patch slows down "sort -n" a bit, and (worse) complicates the code >> for "sort -n". There are a lot of gotos. > Oh no - scary gotos. Seriously - gotos serve a place, and there is > appropriate usage, especially when it comes to minimizing code > duplication within a function (dont believe me? go look at the linux > kernel or any significant piece of C code). In fact, gotos are > actually quite fast (if you know CPUs, you'll realize it's a simple > jmp instruction which does nothing to the CPU pipeline - conditional > jumps are way more problomatic). Some of those gotos are unnecessary > (not removing code duplication), but I think they make the code easier > to read and I'm quite positive that any decent compiler will optimize > away any unnecessary jumps. And code readability is far more > important than that last 0.1% improvement. > > I challenge you to show me one benchmark where the behaviour is > demonstrably slower by even a significant percentage (let's say 3%) > and is not caused by the branching of which comparison to use. > >> Better, I think, would be >> to (1) search for a trailing suffix character, and compare based on >> that, and then (2) fall back on numeric comparison only if the >> trailing suffix characters match. This will certainly make plain >> "sort -n" faster and will result in a less-intrusive change to the >> existing code; arguably it will make the new code faster in many >> cases, too. > Did you read my explanation above for your exact "optimization"? > Please explain to me, in a somewhat detailed manner, how this improves > behaviour because I'm failing to see the algorithm. > > Here's what I think you're suggesting: > > for (each character c in string a) > if (c is not a number) > break > > for (each character d in string b) > if (c is not a number) > break > > if (suffix starting at c != suffix starting at d) > return which one is greater > > perform the regular sort -n comparison between a & b > > You must be thinking that that last step is somehow magical and free, > cause I see it as once again looping over the string. You may be > thinking to yourself - yeah, but there's no expensive comparison > between the two strings. Ok - I'll grant you that. So you've sped up > the best case where the suffixes don't match (more on this later). > Now let's look at the worst case - the suffixes match. Thus you have > to scan over the string again but this time performing your numeric > comparison. So you've made your worst case slower. I have a feeling > that this worst case is also likely to be the average case. > > Also, let's take it that you have managed to speed-up the best-case. > But have you really? What have you saved - the cost of a subtraction > n times (n is the length of the number) over a span of x comparisons. > Do you honestly believe that n * x will be sufficiently big in any > reasonable test case that this saving will actually be realized? x by > the way will be approximately m log m (where m is the number of > strings). So your cost savings over the course of your call to sort > will be n * m log m, for m strings of length n. Let's take a > reasonable n = 3 (after all - higher n on average means you will be > using a different suffix anyways). Let's saying you are sorting a > million strings. So you've saved about 3 000 000 log 3 000 000 = 19 > 431 363 comparisons. Let's be generous here and say that subtraction > takes 20 clock cycles (I don't feel like digging up my FPGA work to > figure out how many clock cycles my implementation took - let alone > the far better ones used in modern processors). On a modern 2.0 GHz > machine, that'll be about 40 ns. So for a million strings, you've > managed to save a whopping 777254520 nanoseconds. Wait a minute - > that's 0.7 seconds. So every million strings - your optimization, in > the best case, will shave off 0.7 seconds. Excellent. Oh - and m has > a much larger impact, so sorting more strings will be more of a factor > than sorting longer strings. > > Now repeat the above analysis for the worst case, and I'm quite > confident you'll see a much larger hit (although probably still not > significant at all) in the worst case (suffixes equal). > > Hopefully I didn't emabarass myself with a glaring mistake - it's really late. > >> >> * Please don't take the address of an often-used local variable; this >> hurts optimization, since it often inhibits optimizing that variable >> into a register. > Could you please provide me with context about where this is happening > (I can't find it in the code). A line number in the patch would be > great. > >> >> * I suggest ignoring any suffix after the 'k' or 'M' or 'G' or whatever; >> it's simpler. > And extremely incorrect - please read the discussion above. > Essentially, you then run into the problem of malformed input - > 2Klingons would be treated the same as 2K Klingongs. Granted, this is > a contrived example, but I hope you see my point. > >> >> * I also suggest allowing only the upper-case version (e.g., 'P', not >> 'p'), as this will allow later support for ISO prefixes like 'p' (for >> "pico") better. (It's OK to allow both 'k' and 'K' to mean kilo, >> though.) > Agreed, although in this context they are numeric suffixes (pico is > the unit prefix, but units play no role in this discussion). The > problem though then becomes that I suddenly have to worry about which > suffixes can be both and which must be one or the either. > Additionally, you've got problems like mu for micro, which I'm not > even sure can be represented in ASCII. Also, h & da are hardly used I > think (& even better, da breaks the pattern of 1 character suffixes). > So supporting the full set seems questionable. > >> >> * exponent_order should be a simple indexed lookup like "table[a]"; >> there shouldn't be a need for a loop checking for "inverse_table[i] == >> a". > Is the memory-speed tradeoff really worth it? The table isn't that > big to begin with. So you're trading off 256-bytes for (in the best > case) a savings of, like what, 5 subtractions per suffix comparison? > I'd probably concede your point on this tomorrow after a nights sleep. >> >> Hope this helps. >> > _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils