Re: pg_plan_advice

Robert Haas Tue, 06 Jan 2026 11:36:32 -0800

On Mon, Dec 29, 2025 at 8:15 PM Lukas Fittl <[email protected]> wrote:
> For 0001, I'm not sure the following comment is correct:
>
> > /* When recursing = true, it's an unplanned or dummy subquery. */
> > rtinfo->dummy = recursing;
>
> Later in that function we only recurse if its a dummy subquery - in the case 
> of an unplanned subquery (rel->subroot == NULL)
> add_rtes_to_flat_rtable won't be called again (instead the relation RTEs are 
> directly added to the finalrtable). Maybe we can
> clarify that comment as "When recursing = true, it's a dummy subquery or its 
> children.".


Presumably, a child of an unplanned or dummy subquery will also be
unplanned or dummy, so I'm not sure I understand the need to clarify
here.

> From my medium-level understanding of the planner, I don't think the lack of 
> tracking unplanned subqueries
> in subrtinfos is a problem, at least for the pg_plan_advice type use cases.

I don't think so, either. I believe that anything that falls into this
category is something that is not actually going to be reflected in
the final plan tree, but we can't lose track of it completely because
it can matter for purposes like locking or invalidation. Since plan
advice only targets things that appear in the final plan tree, it
shouldn't care. If we did want to care, e.g. to emit advice like
WE_ARE_EXPECTING_THIS_TO_BE_DUMMY(whatever_table), we'd need more than
an rtoffset per subquery; we'd have to map each RTI individually,
because some RTIs are tossed completely for in the "dummy" case,
meaning that the rtoffset isn't constant for the whole subquery.
AFAICT, this is not an issue because we need not care about the dummy
subqueries at all. The only reason I included the SubPlanRTInfo at all
for this case is that the previous SubPlanRTInfo might be for a
non-dummy subquery, and some code might want to look at the next entry
in the list to see where the portion of the range table belonging to
that previous subqueries ends. This lets you do that.

> For 0002:
>
> It might be helpful to clarify in a comment that ElidedNode's plan_node_id 
> represents the surviving node, not that of the elided node.

Good point. I'll add this comment:

+ *
+ * plan_node_id is that of the surviving plan node, the sole child of the
+ * one which was elided.

> I also noticed that this currently doesn't support cases where multiple nodes 
> are elided, e.g. with multi-level table partitioning:
>
> CREATE TABLE pt (l1 date, l2 text) PARTITION BY RANGE (l1);
> CREATE TABLE pt_202512 PARTITION OF pt FOR VALUES FROM ('2025-12-01') TO 
> ('2026-01-01') PARTITION BY LIST (l2);
> CREATE TABLE pt_202512_TEST PARTITION OF pt_202512 FOR VALUES IN ('TEST');
>
> EXPLAIN (RANGE_TABLE) SELECT * FROM pt WHERE l1 = '2025-12-15' AND l2 = 
> 'TEST';
>
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Seq Scan on pt_202512_test pt  (cost=0.00..29.05 rows=1 width=36)
>    Filter: ((l1 = '2025-12-15'::date) AND (l2 = 'TEST'::text))
>    Scan RTI: 3
>    Elided Node Type: Append
>    Elided Node RTIs: 1    <=== This is missing RTI 2
>  RTI 1 (relation, inherited, in-from-clause):
>    Relation: pt
>  RTI 2 (relation, inherited, in-from-clause):
>    Relation: pt_202512
>  RTI 3 (relation, in-from-clause):
>    Relation: pt_202512_test
>  Unprunable RTIs: 1 2 3
>
> In a quick test, adding child_append_relid_sets (from 0003) to the relids 
> being passed to record_elided_node fixes
> that. Presumably the case of partitionwise join relids doesn't matter, 
> because that would prevent it being elided.

I'm not really sure there's a problem here. We definitely do not want
to end up with something like "Elided Node RTIs: 1 2". What I've found
experimentally is that it's often important to preserve relid sets,
but you need to preserve them as sets, not individually. So there
could be an argument that we somehow want to preserve both {1} and {2}
here, but that's not equivalent to {1,2}, which looks like a
partitionwise join between relid 1 and relid 2. But it isn't
especially clear to me that we actually need to preserve RTI 2 here.
One reason why preserving RTIs is important is so that as we descend a
join tree, we can find the RTIs that the optimizer thought it was
joining, but that only requires finding RTI 1, not RTI 2. Another
reason why preserving RTIs is important is so that we can use relation
identifiers to describe planning decisions made with respect to those
RTIs, but that doesn't apply here because partition expansion just
always happens.

Of course, it's quite possible that there are reasons unrelated to
this patch set why this information would be good to preserve, but if
we want to do it, we're going to have to adjust the data
representation somehow. We'd either need to give the ElidedNode a
"cars" representation instead of a single RTI set, or we'd need to
have some separate way of representing this. I hesitate a little bit
to design something without a use case in mind, but maybe you have
one?

> For 0003:
>
> I also find the "cars" variable suffix a bit hard to understand, but not sure 
> a comment next to the variables is that useful.
> Separately, the noise generated by all the additional "_cars" variables isn't 
> great.
>
> I wonder a little bit if we couldn't introduce a better abstraction here, 
> e.g. a struct "AppendPathInput" that contains the
> two related lists, and gets populated by 
> accumulate_append_subpath/get_singleton_append_subpath and then
> passed to create_append_path as a single argument.

I spent some time thinking about this day and haven't been quite able
to come up with something that I like. The problem is that
pa_partial_subpaths and pa_nonpartial_subpaths share a single
child_append_relid_sets variable, namely pa_subpath_cars, and
accumulate_append_subpaths gets called with that as the last argument
and different things for the previous two. One thing I tried was
making the AppendPathInput struct contain three lists rather than two,
but then accumulate_append_subpath() needs an argument that makes it
work in one of three different modes:

Mode 1: normal -- add everything to the "normal" list
Mode 2: building parallel-aware append with partial path -- add things
to the "normal" list except for parallel-aware appends which need to
be split between the normal and special lists
Mode 3: building parallel-aware append with non-partial path -- add
things to the "special" list

I also tried splitting up accumulate_append_subpath() into two
functions, thinking that maybe I could segregate the parallel-append
handling from the more normal cases. This seems somewhat appealing in
the sense that having accumulate_append_subpath() hold a bunch of
extra logic that only one call site needs isn't very nice, but
changing it doesn't really seem to help with the problem that we have
two subpath lists sharing one cars list in this case. I'll try to find
some more time to think about this, but if you have any ideas
meanwhile, I'd be happy to hear them.

> Note that 0005 needs a rebase, since 48d4a1423d2e92d10077365532d92e059ba2eb2e 
> changed the GetNamedDSMSegment API.

I'll fix this. I don't love the way that commit made the callback and
the callback arg non-consecutive arguments.

> You may also want to move the CF entry to the PG19-4 commitfest so CFbot runs 
> again.

It seems that I cannot move it to 19-4. I moved it to 19-Final.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Reply via email to