On Thu, Sep 11, 2014 at 10:11 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Thu, Sep 11, 2014 at 9:59 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmh...@gmail.com> writes: >>>> (3) It allows the number of batches to increase on the fly while the >>>> hash join is in process. > >>> Pardon me for not having read the patch yet, but what part of (3) >>> wasn't there already? > >> EINSUFFICIENTCAFFEINE. > >> It allows the number of BUCKETS to increase, not the number of >> batches. As you say, the number of batches could already increase. > > Ah. Well, that would mean that we need a heuristic for deciding when to > increase the number of buckets versus the number of batches ... seems > like a difficult decision.
Well, the patch takes the approach of increasing the number of buckets when (1) there's only one batch and (2) the load factor exceeds NTUP_PER_BUCKET. As Tomas points out, it's just increasing the number of buckets to the exact same number which we would have allocated had we known that this many tuples would show up. Now, there is a possibility that we could lose. Let's suppose that the tuples overflow work_mem, but just barely. If we've accurately estimate how many tuples there are, the patch changes nothing: either way we're going to need two batches. But let's say we've underestimated the number of tuples by 3x. Then it could be that without the patch we'd allocate fewer buckets, never increase the number of buckets, and avoid batching; whereas with the patch, we'll discover that our tuple estimate was wrong, increase the number of buckets on the fly, and then be forced by the slightly-increased memory consumption that results from increasing the number of buckets to do two batches. That would suck. But catering to that case is basically hoping that we'll fall into a septic tank and come out smelling like a rose - that is, we're hoping that our initial poor estimate will cause us to accidentally do the right thing later. I don't think that's the way to go. It's much more important to get the case where things are bigger than we expected but still fit within work_mem right; and we're currently taking a huge run-time penalty in that case, as Tomas' results show. If we wanted to cater to the other case in the future, we could consider the opposite approach: if work_mem is exhahusted, throw the bucket headers away and keep reading tuples into the dense-packed chunks added by yesterday's commit. If we again run out of work_mem, then we *definitely* need to increase the batch count. If we manage to make everything fit, then we know exactly how many bucket headers we can afford, and can use some heuristic to decide between that and using more batches. I don't think that should be a requirement for this patch, though: I think the cumulative effect of Tomas's patches is going to be a win *much* more often than it's a loss. In 100% of cases, the dense-packing stuff will make a hash table containing the same tuples significantly smaller. Except when we've radically overestimated the tuple count, the change to NTUP_PER_BUCKET will mean less time spent chasing hash chains. It does use more memory, but that's more than paid for by the first change. Both the dense-packing stuff and the changes to include bucket headers in the work_mem calculation will have the effect of making the work_mem bound on memory utilization far more accurate than it's ever been previously. And the change to increase the number of buckets at runtime will mean that we're significantly more resilient against the planner underestimating the number of tuples. That's clearly good. The fact that we can construct borderline cases where it loses shouldn't deter us from pushing forward. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers