Re: Default setting for enable_hashagg_disk

Peter Geoghegan Sat, 11 Jul 2020 16:47:14 -0700

On Fri, Jul 10, 2020 at 10:00 PM David Rowley <dgrowle...@gmail.com> wrote:
> hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented
> it differently.  The problem is made worse by the fact that we'll only
> release the memory for the hash table during ExecEndHashJoin(). If the
> planner had some ability to provide the executor with knowledge that
> the node would never be rescanned, then the executor could release the
> memory for the hash table after the join is complete. For now, we'll
> need to live with the fact that an Append containing many children
> doing hash joins will mean holding onto all that memory until the
> executor is shutdown :-(
>
> There's room to make improvements there, for sure, but not for PG13.


I think that we're stuck with the idea that partitionwise join uses up
to one work_mem allocation per partition until we deprecate work_mem
as a concept.

Anyway, I only talked about partitionwise join because that was your
example. I could just as easily have picked on parallel hash join
instead, which is something that I was involved in myself (kind of).
This is more or less a negative consequence of the incremental
approach we have taken here, which is a collective failure.

I have difficulty accepting that something like hash_mem_multiplier
cannot be accepted because it risks making the consequence of
questionable designs even worse. The problem remains that the original
assumption just isn't very robust, and isn't something that the user
has granular control over. In general it makes sense that a design in
a stable branch is assumed to be the norm that new things need to
respect, and not the other way around. But there must be some limit to
how far that's taken.

> It sounds interesting, but it also sounds like a new feature
> post-beta. Perhaps it's better we minimise the scope of the change to
> be a minimal fix just for the behaviour we predict some users might
> not like.

That's an understandable interpretation of the
hash_mem/hash_mem_multiplier proposal on the table, and yet one that I
disagree with. I consider it highly desirable to have a GUC that can
be tuned in a generic and high level way, on general principle. We
don't really do escape hatches, and I'd rather avoid adding one now
(though it's far preferable to doing nothing, which I consider totally
out of the question).

Pursuing what you called hashagg_mem is a compromise that will make
neither of us happy. It seems like an escape hatch by another name. I
would rather just go with your original proposal instead, especially
if that's the only thing that'll resolve the problem in front of us.

-- 
Peter Geoghegan

Re: Default setting for enable_hashagg_disk

Reply via email to