On Sat, Jul 11, 2020 at 09:02:43AM -0700, David G. Johnston wrote:
On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost <sfr...@snowman.net> wrote:

There now seems to be some suggestions that not only should we have a
new GUC, but we should default to having it not be equal to work_mem (or
1.0 or whatever) and instead by higher, to be *twice* or larger whatever
the existing work_mem setting is- meaning that people whose systems are
working just fine and have good estimates that represent their workload
and who get the plans they want may then start seeing differences and
increased memory utilization in places that they *don't* want that, all
because we're scared that someone, somewhere, might see a regression due
to HashAgg spilling to disk.


If that increased memory footprint allows the planner to give me a better
plan with faster execution and with no OOM I'd be very happy that this
change happened. While having a more flexible memory allocation framework
is not a primary goal in and of itself it is a nice side-effect.  I'm not
going to say "let's only set work_mem to 32MB instead of 48MB so I can
avoid this faster HashAgg node and instead execute a nested loop (or
whatever)".  More probable is the user whose current nested loop plan is
fast enough and doesn't even realize that with a bit more memory they could
get an HashAgg that performs 15% faster.  For them this is a win on its
face.

I don't believe this negatively impacts the super-admin in our user-base
and is a decent win for the average and below average admin.

Do we really have an issue with plans being chosen while having access to
more memory being slower than plans chosen while having less memory?

The main risk here is that we choose for a user to consume more memory than
they expected and they report OOM issues to us.  We tell them to set this
new GUC to 1.0.  But that implies they are getting many non-HashAgg plans
produced when with a bit more memory those HashAgg plans would have been
chosen.  If they get those faster plans without OOM it's a win, if it OOMs
it's a loss.  I'm feeling optimistic here and we'll get considerably more
wins than losses.  How loss-averse do we need to be here though?  Npte we
can give the upgrading user advance notice of our loss-aversion level and
they can simply disagree and set it to 1.0 and/or perform more thorough
testing.  So being optimistic feels like the right choice.


I don't know, but one of the main arguments against simply suggesting
people to bump up work_mem (if they're hit by the hashagg spill in v13)
was that it'd increase overall memory usage for them. It seems strange
to then propose a new GUC set to a default that would result in higher
memory usage *for everyone*.

Of course, having such GUC with a default a multiple of work_mem might
be a win overall - or maybe not. I don't have a very good idea how many
people will get bitten by this, and how many will get speedup (and how
significant the speedup would be).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to