On Tue, Sep 03, 2019 at 11:04:43AM +1200, Thomas Munro wrote:
On Tue, Sep 3, 2019 at 5:20 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
FWIW it's not clear to me why the cost would need to be recomputed after
constructing the parallel version of the plan? My understanding is that
the idea is to do cost-based planning for the serial plan, and then just
"mechanically" construct a parallel plan. Although, maybe there could be
multiple parallel alternatives ...
Presumably you still need to choose between the serial and parallel
plans by comparing costs. You lose some by adding exchange operators,
but you win some by dividing cardinality estimates.
Oh, right. Silly me.
>This is sort of fudging together of ideas from conversations with
>Kevin Grittner (who talked about admission control a few years back),
>Peter Geoghegan (who mentioned opportunistically asking for more), and
>things I've heard of on SQL Server ("memory grants"). I think it
>would provide some relief from the problems we see today: it's hard to
>set work_mem so that you never get OOM but you can still use a decent
>amount of your precious memory, especially with mixed parallel and
>non-parallel query workloads thanks to our current
>work_mem-multiplying design.
I think this is probably the simplest and most realistic first step.
Whenever I was thinking about memory acquisition, I've assumed we'd
monitor how much memory the plan is expected to use while we're
constructing it. My main problem was what to do when we reach the
per-query limit - whether to (a) simply reject the plan, (b) go back and
see if we can replan with lower work_mem (but how much and for which
nodes?), or (c) just continue.
Yeah, it's all quite tricky and circular. But I'm pretty sure that we
need caps at execution time, anyway, so I think it's OK to start at
that end of the problem and then later try to improve the way the
planner.
True.
The proposed plan deals with this by not limiting the per-query (or rather
per-session) budget directly, and instead requesting requesting additional
budget. Which is nice.
I suspect we should also keep an additional plan that is expected to meet
the session_work_mem limit, aside from the regular cheapest plan, and use
it if it's not much worse. Imagine you have a plan with cost 1000 that
needs (global_work_mem/2 + 1kB) memory, essentially serializing executions
of this query. And then there's an alternative plan with cost 1100 that
can run with session_work_mem. It seems better to just accept the second
plan, because it won't need to wait.
Hmm. I wonder if it's worth it. You could also just replan as you
said, but I'm wondering if just rejecting the query would be OK.
I think we should not reject queries unnecessarily, if there's a workable
execution plan. It's just another optimization criteria, and erroring out
right after planning is essentially "can't find a plan". But when there is
a plan that we could use, that seems like a bad idea.
Another challenge with work_mem is that anyone can modify it arbitrarily,
i.e. a user can do
SET work_mem = '1TB';
and use as much memory as they wist, or even crash the system. I wonder if
we could define the new GUCs (session_work_mem and global_work_mem) in a
way to prevent this. We probably don't want to make them PGC_POSTMASTER
(it seems useful to allow overriding them in ALTER USER/DATABASE), but I
don't think we have a good way to do that at the moment. Any ideas in this
direction?
How about something giving the superuser the following GUCs:
global_work_mem = 16GB
session_min_work_mem = 0.5% -- the amount of quota sessions keep, for
fast small queries
session_max_work_mem = 20% -- the maximum quota any one session is allowed
session_extra_work_mem = 5% -- opportunistic execution-time boost
Users are free to plan queries with work_mem = 1TB, and if you do that
and it estimates that it wants 512GB, it will be rejected if you try
to execute it because it exceeds session_max_work_mem, with a hint
telling you to turn down work_mem. Otherwise it either runs or joins
the queue if it can't get the quota it needs immediately.
Seems reasonable, certainly for v1. I'd keep it as simple as possible.
Eventually we could try to figure out how to set work_mem to automatic
(I don't want to propose a concrete rule, but maybe something based on
session_max_work_mem / njoins, with various fudge factors, and some
accounting for parallel workers; it's probably good to low-ball it and
rely on session_extra_work_mem).
Hmm, so you'd tweak work_mem for individual queries? Not sure that's
something I'd do at this point - it may seem simple, but I think it's
actually way harder to get right.
For example let's say you have two views that are planned nicely, then you
join then and suddenly the plan is much worse because the actual work_mem
got much lower suddenly. That's not great.
Of course, if it's just optional behavior, and the current with explicit
work_mem value is the default, then this is not an issue.
Anyway, I'd focus on MVP doing the bare minimum with simply enforcing a
session limit, and leave this for the future.
Yeah, I think you'd want to be able to set session_XXX on databases
and roles so that you can say your regular users can't eat more than
10% of memory each, but a big reporting thing is allowed more.
Yeah, something like that.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services