Re: Avoiding hash join batch explosions with extreme skew and weird stats

David Kimura Thu, 30 Apr 2020 00:48:23 -0700

On Wed, Apr 29, 2020 at 4:39 PM Melanie Plageman
<melanieplage...@gmail.com> wrote:
>
> In addition to many assorted TODOs in the code, there are a few major
> projects left:
> - Batch 0 falling back
> - Stripe barrier deadlock
> - Performance improvements and testing
>


Batch 0 never spills.  That behavior is an artifact of the existing design that
as an optimization special cases batch 0 to fill the initial hash table. This
means it can skip loading and doesn't need to create a batch file.

However in the pathalogical case where all tuples hash to batch 0 there is no
way to redistribute those tuples to other batches. So, existing hash join
implementation allows work_mem to be exceeded for batch 0.

In adaptive hash join approach, there is another way to deal with a batch that
exceeds work_mem. If increasing the number of batches does not work then the
batch can be split into stripes that will not exceed work_mem. Doing this
requires spilling the excess tuples to batch files. Following patch adds logic
to create a batch 0 file for serial hash join so that even in pathalogical case
we do not need to exceed work_mem.

Thanks,
David

v6-0002-Implement-fallback-of-batch-0-for-serial-adaptive.patch
Description: Binary data

Re: Avoiding hash join batch explosions with extreme skew and weird stats

Reply via email to