Re: Memory-Bounded Hash Aggregation

Tomas Vondra Fri, 13 Dec 2019 08:18:23 -0800

On Thu, Dec 12, 2019 at 06:10:50PM -0800, Jeff Davis wrote:

On Thu, 2019-11-28 at 18:46 +0100, Tomas Vondra wrote:

13) As for this:


     /* make sure that we don't exhaust the hash bits */
     if (partition_bits + input_bits >= 32)
         partition_bits = 32 - input_bits;

We already ran into this issue (exhausting bits in a hash value) in
hashjoin batching, we should be careful to use the same approach in
both
places (not the same code, just general approach).


I assume you're talking about ExecHashIncreaseNumBatches(), and in
particular, commit 8442317b. But that's a 10-year-old commit, so
perhaps you're talking about something else?

It looks like that code in HJ is protecting against having a very large
number of batches, such that we can't allocate an array of pointers for
each batch. And it seems like the concern is more related to a planner
error causing such a large nbatch.

I don't quite see the analogous case in HashAgg. npartitions is already
constrained to a maximum of 256. And the batches are individually
allocated, held in a list, not an array.

It could perhaps use some defensive programming to make sure that we
don't run into problems if the max is set very high.

Can you clarify what you're looking for here?


I'm talking about this recent discussion on pgsql-bugs:

https://www.postgresql.org/message-id/CA%2BhUKGLyafKXBMFqZCSeYikPbdYURbwr%2BjP6TAy8sY-8LO0V%2BQ%40mail.gmail.com

I.e. when number of batches/partitions and buckets is high enough, we
may end up with very few bits in one of the parts.

Perhaps I can also add a comment saying that we can have less than
HASH_MIN_PARTITIONS when running out of bits.


Maybe.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory-Bounded Hash Aggregation

Reply via email to