On Fri, Apr 21, 2023 at 9:50 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Jehan-Guillaume de Rorthais <j...@dalibo.com> writes: > > After catching up with this thread, where pending bugs are listed and > > discussed, > > I wonder if the current patches trying to lower the HashJoin memory > > explosion[1] > > could be added to the "Older bugs affecting stable branches" list of > > https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items as I think they > > deserve some discussion/triage for v16? > > They do not. That patch is clearly nowhere near ready to commit, and > even if it was, I don't think we'd consider it post-feature-freeze. > Any improvement in this space would be a feature, not a bug fix, > despite anyone's attempts to label it a bug fix.
So, I think this may be a bit harsh. The second patch in the set only moves hash join batch buffile creation into a more granular memory context to make it easier to identify instances of this bug (which causes OOMs). It is missing a parallel hash join implementation and a bit more review. But it is not changing any behavior. If using a separate memory context solely for the purpose of accounting is considered an anti-pattern, we could use some arithmetic like hash_agg_update_metrics() to calculate how much space is taken up by these temporary file buffers. Ultimately, either method is a relatively small change (both LOC and impact AFAICT). Currently, it isn't possible for a user to understand what is consuming so much memory when hash join batch file buffers substantially exceed the size of the actual hashtable. This memory usage is not displayed in EXPLAIN ANALYZE or anywhere else. I think adding a debugging message with some advice for is a reasonable concession to the user. This may not constitute a bug "fix", but I don't really see how this is a feature. - Melanie