> That's worth trying out. It might also then be worth trying to push both > unordered values -- the big one up / the small one down. I've seen other > implementations do that, but don't remember where, or what it's called.
It is important that we do not do 2 compares two avoid one copy (assignment to temporary) as you did in your patch earlier in this thread, cause compares are usually pretty costly (also two compares are then inlined, bloating the binary). Assigning a sort tuple to a temporary translates to pretty simple assembly code, so my suggested modification should outperform. It cuts down the cost of the inner loop by ca. 40% comparing the assembly. And it avoids having to re-read memory on each comparison, as the temporary can be held in registers. > "Unbounded" means no bounds check on the array. That's not possible in its > current form, so I think you misunderstood something. Sorry for the confusion. I didn't mean unbounded in the "array bound checking" sense, but in the "unrestricted number of loops" sense. > I only remember implementations tracking loop iterations, not swaps. You'd > need evidence that this is better. Well, the idea was to include the presorted check somehow. Stopping after a certain number of iterations is surely more safe than stopping after a number of swaps, but we would then implicitly also stop our presort check. We could change that though: Count loop iterations and on bailout continue with a pure presort check, but from the last position of the insertion sort -- not all over again -- by comparing against the maximum value recorded during the insertion sort. Thoughts? > An important part not mentioned yet: This might only be worth doing if the > previous recursion level performed no swaps during partitioning and the > current pivot candidates are ordered. Agreed. > It's currently 7, but should really be something like 10. A script that > repeats tests for, say, 7 through 18 should show a concave-up shape in the > times. The bottom of the bowl should shift to higher values, and that minimum > is what should be compared. Yeah, as alluded to before, it should be closer to 10 nowadays. In any case it should be changed at least from 7 to 8, cause then we could at least discard the additional check for n > 7 in the quicksort code path (see /src/include/lib/sort_template.h#L322). Currently we check n < 7 and a few lines down we check for n > 7, if we check n < 8 for insertion sort then the second check becomes obsolete. Benjamin Coutu http://www.zeyos.com