On Tue, Jun 10, 2014 at 1:43 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Tue, Jun 10, 2014 at 5:17 AM, Robert Haas <robertmh...@gmail.com> wrote: >> The problem case is when you have 1 batch and the increased memory >> consumption causes you to switch to 2 batches. That's expensive. It >> seems clear based on previous testing that *on the average* >> NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an >> increase in the number of batches it will be much worse - particularly >> because the only way we ever increase the number of batches is to >> double it, which is almost always going to be a huge loss. > > Is there a reason we don't do hybrid hashing, where if 80% fits in memory > than we write out only the 20% that doesn't? And then when probing the table > with the other input, the 80% that land in in-memory buckets get handled > immediately, and only the 20 that land in the on-disk buckets get written > for the next step?
We have an optimization that is a little bit like that. The "skew" hash join stuff tries to (essentially) ensure that the MCVs are in the first batch. But more could probably be done. For example, suppose we have 256 buckets. If the hash table overflows work_mem, we could write the contents of *one bucket* out to disk, rather than (as we currently do) half of the table. If we overflow again, we write another bucket. When the number of buckets written reaches half the total, we split all of the remaining buckets so that all 256 slots are once again active. Repeat as needed. If something like that worked out, it would drastically reduce the penalty for slightly overrunning work_mem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers