On 26/8/2024 18:32, Robert Haas wrote:
I'm somewhat expecting to be flamed to a well-done crisp for saying
this, but I think we need better ways for extensions to control the
behavior of PostgreSQL's query planner. I know of two major reasons
It is the change I have been waiting for a long time. Remember how many
kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written
for only the reason of a small number of hooks - I guess many other
people could cheer such work.
why somebody might want to do this. First, you might want to do
something like what pg_hint_plan does, where it essentially implements
Oracle-style hints that can be either inline or stored in a side table
and automatically applied to queries.[1] In addition to supporting
Oracle-style hints, it also supports some other kinds of hints so that
you can, for example, try to fix broken cardinality estimates. Second,
My personal most wanted list:
- Selectivity list estimation hook
- Groups number estimation hook
- hooks on memory estimations, involving work_mem
- add_path() hook
- Hook on final RelOptInfo pathlist
- a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query
structures
- Extensibility of extended and plain statistics
- Hook on portal error processing
- Canonicalise expressions hook
you might want to convince the planner to keep producing the same kind
of plan that it produced previously. I believe this is what Amazon's
query plan management feature[2] does, although since it is closed
source and I don't work at Amazon maybe it's actually implemented
completely differently. Regardless of what Amazon did in this case,
plan stability is a feature people want. Just trying to keep using the
same plan data structure forever doesn't seem like a good strategy,
because for example it would be fragile in the case of any DDL
changes, like dropping and recreating an index, or dropping or adding
As a designer of plan freezing feature [1] I can say it utilises
plancache and, being under its invalidation callbacks it doesn't afraid
DDL or any other stuff altering database objects.
Unfortunately, the part about the hook having the freedom to delete
paths isn't really true. Perhaps technically you can delete a path
that you don't want to be chosen, but any paths that were dominated by
the path you deleted have already been thrown away and it's too late
to get them back. You can modify paths if you don't want to change
their costs, but if you change their costs then you have the same
problem: the contents of the pathlist at the time that you see it are
determined by the costs that each path had when it was initially
added, and it's really too late to editorialize on that. So all you
can really do here in practice is add new paths.
From my standpoint, it is enough to export routines creating paths and
calculating costs.
set_join_pathlist_hook, which applies to joinrels, is similarly
limited. appendrels don't even have an equivalent of this hook.
So, how could we do better?
I think there are two basic approaches that are possible here. If
someone sees a third option, let me know. First, we could allow users
to hook add_path() and add_partial_path(). That certainly provides the
flexibility on paper to accept or reject whatever paths you do or do
+1
The attached patch, briefly mentioned above, essentially converts the
enable_* GUCs into RelOptInfo properties where the defaults are set by
the corresponding GUCs. The idea is that a hook could then change this
on a per-RelOptInfo basis before path generation happens. For
IMO, it is better not to switch on/off algorithms, but allow extensions
to change their cost multipliers, modifying costs balance. 10E9 looks
like a disable, but multiplier == 10 for a cost node just provide more
freedom for hashing strategies.
[1] https://postgrespro.com/docs/enterprise/16/sr-plan
--
regards, Andrei Lepikhov