On Tue, Jun 10, 2014 at 5:17 AM, Robert Haas <robertmh...@gmail.com> wrote:



> The problem case is when you have 1 batch and the increased memory
> consumption causes you to switch to 2 batches.  That's expensive.  It
> seems clear based on previous testing that *on the average*
> NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an
> increase in the number of batches it will be much worse - particularly
> because the only way we ever increase the number of batches is to
> double it, which is almost always going to be a huge loss.
>

Is there a reason we don't do hybrid hashing, where if 80% fits in memory
than we write out only the 20% that doesn't? And then when probing the
table with the other input, the 80% that land in in-memory buckets get
handled immediately, and only the 20 that land in the on-disk buckets get
written for the next step?

Obviously no one implemented it yet, but is there a fundamental reason for
that or just a round tuit problem?

Cheers,

Jeff

Reply via email to