On Fri, Jan 29, 2016 at 12:46 PM, Peter Geoghegan <p...@heroku.com> wrote: > On Fri, Jan 29, 2016 at 9:24 AM, Robert Haas <robertmh...@gmail.com> wrote: >> I feel like this could be data driven. I mean, the cost model is >> based mainly on the tuple width and the size of the SortTuple array. >> So, it should be possible to tests of both algorithms on 32, 64, 96, >> 128, ... byte tuples with a SortTuple array that is 256MB, 512MB, >> 768MB, 1GB, ... Then we can judge how closely the cost model comes to >> mimicking the actual behavior. > > You would also need to represent how much of the input actually ended > up being sorted with the heap in each case. Maybe that could be tested > at 50% (bad for "quicksort with spillover"), 25% (better), and 5% > (good). > > An alternative approach that might be acceptable is to add a generic, > conservative 90% threshold (so 10% of tuples sorted by heap).
I don't quite know what you mean by these numbers. Add a generic, conservative threshold to what? Thinking about this some more, I really think we should think hard about going back to the strategy which you proposed and discarded in your original post: always generate the first run using replacement selection, and every subsequent run by quicksorting. In that post you mention powerful advantages of this method: "even without a strong logical/physical correlation, the algorithm tends to produce runs that are about twice the size of work_mem. (It's also notable that replacement selection only produces one run with mostly presorted input, even where input far exceeds work_mem, which is a neat trick.)" You went on to dismiss that strategy, saying that "despite these upsides, replacement selection is obsolete, and should usually be avoided." But I don't see that you've justified that statement. It seems pretty easy to construct cases where this technique regresses, and a large percentage of those cases are precisely those where replacement selection would have produced a single run, avoiding the merge step altogether. I think those cases are extremely important. I'm quite willing to run somewhat more slowly than in other cases to be certain of not regressing the case of completely or almost-completely ordered input. Even if that didn't seem like a sufficient reason unto itself, I'd be willing to go that way just so we don't have to depend on a cost model that might easily go wrong due to bad input even if it were theoretically perfect in every other respect (which I'm pretty sure is not true here anyway). I also have another idea that might help squeeze more performance out of your approach and avoid regressions. Suppose that we add a new GUC with a name like sort_mem_stretch_multiplier or something like that, with a default value of 2.0 or 4.0 or whatever we think is reasonable. When we've written enough runs that a polyphase merge will be required, or when we're actually performing a polyphase merge, the amount of memory we're allowed to use increases by this multiple. The idea is: we hope that people will set work_mem appropriately and consequently won't experience polyphase merges at all, but it might. However, it's almost certain not to happen very frequently. Therefore, using extra memory in such cases should be acceptable, because while you might have every backend in the system using 1 or more copies of work_mem for something if the system is very busy, it is extremely unlikely that you will have more than a handful of processes doing polyphase merges. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers