On Fri, Nov 17, 2017 at 9:22 PM, Peter Geoghegan <p...@bowt.ie> wrote: > I think that it's reasonable for us to make it a goal of the executor > to have operations that have a smooth cost function, in order to > manage the risk of misestimation well, and to make it a goal to have > operations that are otherwise adaptive to misestimation.
Hash joins are a place where we could have a smoother cost function than we do. When we run out of memory, instead of switching from (say) a single batch to two batches, switch to 64 batches, but initially keep 63 of them in memory and only write the very last one to disk. Every time we again run out of memory, dump another batch to disk. If we end up dumping more than half or so of the batches to disk, switch to an even larger number of batches to make it even more fine-grained. The current system is bad because you jump from spooling NO tuples to a tuplestore to spooling HALF of the inner AND outer tuples to a tuplestore. If the hash table is just a little too big to fit, we could write 1/64 or 2/64 or 3/64 of the inner and outer tuples to a tuplestore instead of HALF of them, which would be a huge win. That having been said, I think the place where our plans most commonly go wrong is where we incorrectly estimate the number of tuples by multiple orders of magnitude - 100x is common, 1000x is common, a million x is not uncommon, even a billion x is not unheard-of. And I don't think there's any way to make a hash join happy if it thinks it's going to need 1 batch and it ends up needing a million batches. At that, even if the cost function is very smooth, you've moved so far along the curve that you're probably not in a good place. So, while I think that smoothing out the cost functions is a good idea, I think we also need to consider what more can be done to improve the estimates - and especially to avoid estimates that are off by huge multiples. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company