Re: pg_plan_advice (now with transparent SQL plan performance overrides - pg_stash_advice)

Jakub Wartak Wed, 04 Mar 2026 01:51:16 -0800

On Tue, Mar 3, 2026 at 7:56 PM Robert Haas <[email protected]> wrote:


Hi Robert,

> On Tue, Mar 3, 2026 at 6:42 AM Jakub Wartak
> <[email protected]> wrote:
> > 1. First thought is that I found it quite surprising, we now have 3 modules
> > and it's might cause confusion and lack of consistency:
> > - 3 can to be in shared_preload_libraries (pg_stash_advice,
> > pg_plan_advice, pg_collecti_advice)
> > - 2 others can use CREATE EXTENSION (pg_stash_advice, pg_collect_advice), 
> > but
> >   "create extension pg_plan_advice;" fails
> > so maybe they all should behave the same as people (including me) won't read
> > the docs and just blindly add it here and there and issue CREATE EXTENSION,
> > but it's going to be hard to remember for which ones (?) So we need more
> > consistency?
>
> That's a possible point of view, but the flip side is that then it's
> all-or-nothing. I think pg_plan_advice is a toolkit that bridge as a
> rather large gulf between the theoretical power to walk plan trees and
> use pgs_mask to modify behavior and the practicalities of actually
> doing so. I think we would be well-advised to think of pg_plan_advice
> as a core of functionality that, some day, we might even think of
> moving into core, and pg_collect_advice and pg_stash_advice as things
> that can be built around that core. I think it's important to have
> demonstrations available that show that pg_plan_advice is not a
> "sealed hood" that you must accept exactly as it is, but rather a
> toolkit that you can do a variety of things with. pg_collect_advice
> and pg_stash_advice are examples of what you can do, but many
> variations on those same themes are possible.

This is micro-thing, feel free to ignore, but well I was after something much
more easy: `CREATE EXTENSION pg_plan_advice`to be no-op without any failure
even if it doesnt provide any views or fucntions right now (so empty
share/extension/pg_plan_advice.control and -1.0.sql) or at least some dummy
function, just so that CREATE EXTENSION pg_plan_advice wouldn't fail.
This is nothing technical, it's just sharp rough edge for user (but
technically sound), that 2 are deployed with CREATE EXTENSION but 3rd one is
not. I just think that if we put 3 into shared_preload_libraries then I won't
have to think for which ones to exec CREATE EXTENSION (I would just blindly do
it for all and the error wouldn't make somebody unhappy that something is
potentially not working because CREATE EXTENSION is not for it) - it's pure
user-focused usability feedback.

> > 2. Should pgca always duplicate entries like that? (each call gets new 
> > entry?)
> > Shouldn't it just update collection_time for identical calls of
> > userid/dbid/queryid?
>
> To make it do that, we would need a completely different way of
> storing entries, in order to avoid a linear scan of all collected data
> for each new query. It would be a completely different module. I think
> we probably want to have something like that, but it isn't this. See
> previous point.
>
[..]
> The functionality has not changed since I first posted this. It's only
> had bug fixes and now some refactoring.

OK, I misremembered it then.

> > 3c. However for pgbench -M prepared, such online plan alterations are
> > strangley not effective.
> [....]
> > 3d. ... however restarting restarting pgbench helps and the session
> > changes plan (and thus
> > performance characteristics).
> > 3e. As this was pretty concerning, I've repeated the pgbench -M prepared
> > excercise and I've figured out that I could achieve effect that I wanted (
> > boucing between fast <-> slow immediatley) by injecting call via gdb on that
> > backend to InvalidateSystemCaches(). Kind of brute-force, but only
> > then it worked
>
> One thing to keep in mind is that the whole point of -M prepared is to
> avoid replanning. My guess is that what you're seeing here is that
> only replan when something invalidates the plan, which seems more like
> a feature than a bug, although possibly somebody is going to want a
> way to force an invalidation when the stashed advice changes. That's a
> pretty big hammer, though. Does this behavior go away if you use -M
> extended?

Yes with -M extended it is instant. I have found also that with
-M prepared I can do simple one-time `analyze pgbench_accounts` (when changing
SEQ_SCAN <-> INDEX_SCAN for that table) and that is also enough for
the backend to immediatley see (and react to) to what's in the active
configured stash even for future changes without further ANALYZEs.
Not sure if pg_stash_advice needs a function to flush-force all backends,
so the plans are 100% effecitve, as apparently we seem to have ANALYZE
already, but it is not that obvious that one might want to use it.

If there would be such function to gurantee, we probably wouldn't see
complaints like 'I have done this and session still is using old plan'.

>
> > 5. QQ: will pg_stash_advice persist the stashes one day?
>
> I have no current plan to implement that, but who knows?

OK, so perhaps docs for pg_create_advice_stash() and pg_set_stashed_advice()
should mention those 'stashes' are not persistent across restarts. Without
this I can already hear scream of some users from the future that they
applied advice, it fixed problem and after some time it disappeared (In
other RDBMS it is persistent, so users coming from there might have such
expectations).

Also related, we could also mention that this information is not replicated to
standbys as this is in-memory only.

> > 6. Any idea for better name than 'stash' ? :) It's some new term that is for
> > sure not wildly recognized. Some other used name across industry: plan
> > stability,
> > advice freeze, plan freeze, plan force, baselines, plan management, force 
> > plan,
> > force paths, SQL plan optimization profiles, query store (MSSQL).
>
> I'm up to change this if there's a consensus on what it should be
> called, but that will require more than 1 vote. I picked stash because
> I wanted to capture the idea of sticking something in a place where
> you could grab it easily. Personally, my main critique of that name is
> that this particular module matches by query ID and you could instead
> match by $WHATEVER, so what would you call the next module that also
> stashes advice strings but keyed in some slightly different way?
> (Likewise, what do you call the next alternative to pg_collect_advice
> that uses different collection criteria or storage?)
>
> Honestly, I was hoping and sort of expecting to get a lot more naming
> feedback back when I first posted this. Don't call it advice, call it
> hints or guidance or tweaks! Don't write HASH_JOIN(foo), write
> JOIN_METHOD(foo, HASH_JOIN)! And so forth. On the one hand, the fact
> that I didn't get that back then has made this less work, and I'm
> still pretty happy with my choices. Furthermore, it's still not too
> late to do some renaming if we agree on what that should look like. At
> the same time, I don't particularly want to get into a fun and
> exciting game of design-by-committee-under-time-pressure. As a
> Demotivators poster said many years ago, "none of us is as dumb as all
> of us." I freely admit that some of my naming and some of my design
> decisions are almost certainly suboptimal, but at the same time, it
> would be easy to underestimate the amount of thought that I've put
> into some of the naming, so I'd only like to change it if we have a
> pretty clear consensus that any given counter-proposal is better than
> what I did. I especially do not want to go change it and then have to
> change it all again because somebody else shows up with a new opinion.

Well IMHO all the rest naming in pretty great shape and I think that
pg_[collect|plan]_advice are great names too. It's just that `stash`
keyword doesn't ring a bell to me at all that `pg_stash_advice` is
related in any way to online/transparent/runtime plan modification and
can be used to alter plans for other backends. Something like
`pg_deploy_advice` or `pg_apply_advice` would be more in line with the
other two, but perhaps it's just me..

> > 7. I saw you have written that in docs to be careful about memory use, but
> > wouldn't be it safer if that maximum memory for pgca (when collecting in 
> > shared
> > with almost infinite limite) would be still subject to like let's say 5% of 
> > s_b
> > (or any other number here)?
>
> It depends on what you mean by safer. One thing that is unsafe is
> running the computer out of memory. However, another thing that is
> unsafe is getting the wrong answer. I felt that limiting the number of
> queries was a good compromise. If you limit it to a certain amount of
> memory usage, then (1) it's much harder to actually figure out whether
> we're over budget and how much we need to free to be under budget and
> (2) I suspect it's also harder to be sure whether you've set the limit
> high enough to not lose any of the data you intended to retain. I
> think that a fixed limit like 5% of shared_buffers would be, bluntly,
> completely nuts. There is no reason to suppose that the amount of
> memory someone is willing to use to store advice has anything to do
> with the size of shared_buffers. If you have 100 sessions running
> pgbench and you enabled the local collector with that limit in all of
> them, you would be using 500% of shared_buffers in advice-storage
> memory and that would probably take down the server. On the other
> hand, if you're using the shared collector with shared_buffers=2GB,
> that is only 10MB, which might easily be too small to store all the
> data you care about even on a test system.
>
> The reason I set the limits in terms of number of queries was that (1)
> it's easy for the code to figure out whether it's over the limit and
> easy for it to reduce utilization to the limit and (2) I thought that
> people using this were likely to have a better idea how many queries
> they wanted to collect than how many MB/GB they wanted to collect. Of
> course, (1) could be solved with enough effort and (2) could be
> wrong-headed on my part, but if somebody wanted this changed, it would
> have been nice to know that a few months ago when I first posted this
> rather than now.

I havent reported it earlier, well because there was little sense playing with
collection implementation much earlier if we didn't have pinning of the plans
itself (and the whole debate query_id also seemed pointless back then). Well
anyway, it is much like work_mem danger, but work_mem is about well specific
amount of memory (* N plan nodes) so it's just easier to put safer value there.

How about if we would just measure (estimate) with some small table number of
entries vs memory used and put that into the docs?, so users are wary that
they shouldn't just blidnly set it to high value? E.g. with
1000 local limit I get ~280kB for pg_collect_advice context and with 1000000
local limit I've got it to ~50MB and stopped looking further (it was still
growing). Itself that's not terrible but higher values with lots of backends
might cause huge memory pressure (OOMs).

Or maybe other idea: is there is possibility of making GUCs like
local_collection_limits/local_collector settable only using
SET/SET LOCAL, but not global? I mean what's the point of having being
able to collect locally system-wide when realistically I cannot
pull it back from backend-local memory? (and this removes the danger
of multple backends goind wild with memory together).

Below maybe useful for others - if they trying this, if testing performance
please ensure you do not have Casserts as with Casserts enabling higher
local_collection_limits are barley usable:

progress: 137.0 s, 1516.9 tps, lat 0.659 ms stddev 0.174, 0 failed
progress: 138.0 s, 1469.1 tps, lat 0.680 ms stddev 0.174, 0 failed
progress: 139.0 s, 1050.0 tps, lat 0.952 ms stddev 0.206, 0 failed
progress: 140.0 s, 922.0 tps, lat 1.085 ms stddev 0.194, 0 failed
[..]
progress: 143.0 s, 664.0 tps, lat 1.504 ms stddev 0.377, 0 failed
progress: 144.0 s, 595.0 tps, lat 1.681 ms stddev 0.386, 0 failed
progress: 145.0 s, 531.0 tps, lat 1.883 ms stddev 0.402, 0 failed
[..]
progress: 159.0 s, 260.0 tps, lat 3.840 ms stddev 0.607, 0 failed
progress: 160.0 s, 271.0 tps, lat 3.702 ms stddev 0.447, 0 failed

resetting it back to 0 or disabling local collector and reloading won't
fix it, backend needs to re-establish connection. Even with just 10000
local collection limit it just gets down from ~1500 tps to 900 tps.
It's seems the impact on CPU coming to be from exec_simple_query()
-> finish_xact_command() -> MemoryContextCheck() -> AllocSetCheck(),
so memory context validation that have literally nothing to do with
with this patch (other than it using a lot of memory in those scenarios)

> > 9. If IsQueryIdEnabled() is false (even after trying out to use 'auto')
> > shouldn't this module raise warning when pg_stash_advice.stash_name != NULL?
>
> I think the bar for printing a warning on every single query is
> extremely high. Doing that just because of a questionable
> configuration setting doesn't seem like a good idea.

Why on every single query? I'm thinking that this should bail out in pgca&pgsa
during initialization of those modules. What's the point of those modules
if queryid is always 0? (I'm assuming somebody has compue_query_id=off and
still loaded those modules).

> > 10. I'm was here mainly for v18-0007 (pg_stash_advice), but this still looks
> > like some small bug to me (minmax matching in v18-0003):
> >
[..]
>
> I think this is just out of scope for now. It's documented that we
> don't have a way of controlling aggregation behavior at present. [..]

Oh ok, I've completley missed "pgplanadvice-limitations" SGML section about
aggregates, so sure - fair limitation.

BTW: the good news is that I haven't seen a single crash when throwing
wild stuff on it or having strange ideas at pg_stash_advice usage.

-J.

Re: pg_plan_advice (now with transparent SQL plan performance overrides - pg_stash_advice)

Reply via email to