(2019/04/26 3:24), Tom Lane wrote:
PG Bug reporting form<nore...@postgresql.org> writes:
[ this crashes if ft4 is a postgres_fdw foreign table: ]
select exists(select c1 from ft4), avg(c1) from ft4 where c1 = (select
max(c1) from ft4);
Hm, the max() subquery isn't necessary, this is sufficient:
select exists(select c1 from ft4), avg(c1) from ft4 where c1 = 42;
That produces a plan like
QUERY PLAN
-----------------------------------------------------------------------------------
Foreign Scan (cost=200.07..246.67 rows=1 width=33)
Output: ($0), (avg(ft4.c1))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT $1::boolean, avg(c1) FROM "S 1"."T 3" WHERE ((c1 = 432))
InitPlan 1 (returns $0)
-> Foreign Scan on public.ft4 ft4_1 (cost=100.00..212.39 rows=3413
width=0)
Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)
Now one's first observation about that is that it's kind of dumb to send
the result of the locally-executed InitPlan over to the far end only to
read it back. So maybe we should be thinking about how to avoid that.
We do avoid it for plain foreign scans:
regression=# explain verbose
select exists(select c1 from ft4), * from ft4 where c1 = 42;
QUERY PLAN
-----------------------------------------------------------------------------------
Foreign Scan on public.ft4 (cost=200.03..226.15 rows=6 width=41)
Output: $0, ft4.c1, ft4.c2, ft4.c3
Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 = 42))
InitPlan 1 (returns $0)
-> Foreign Scan on public.ft4 ft4_1 (cost=100.00..212.39 rows=3413
width=0)
Remote SQL: SELECT NULL FROM "S 1"."T 3"
(6 rows)
and also for foreign joins:
regression=# explain verbose
select exists(select c1 from ft4), * from ft4, ft4 ft4b where ft4.c1 = 42 and
ft4b.c1 = 43;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=200.03..252.93 rows=36 width=81)
Output: $0, ft4.c1, ft4.c2, ft4.c3, ft4b.c1, ft4b.c2, ft4b.c3
Relations: (public.ft4) INNER JOIN (public.ft4 ft4b)
Remote SQL: SELECT r1.c1, r1.c2, r1.c3, r2.c1, r2.c2, r2.c3 FROM ("S 1"."T 3" r1 INNER JOIN
"S 1"."T 3" r2 ON (((r2.c1 = 43)) AND ((r1.c1 = 42))))
InitPlan 1 (returns $0)
-> Foreign Scan on public.ft4 ft4_1 (cost=100.00..212.39 rows=3413
width=0)
Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)
but the code for upper-relation scans is apparently stupider than either
of those cases.
The proximate cause of the crash is that we have {PARAM 1}
(representing the output of the InitPlan) in the path's fdw_exprs, and
also the identical expression in fdw_scan_tlist, and that means that when
setrefs.c processes the ForeignScan node it thinks it should replace the
{PARAM 1} in fdw_exprs with a Var representing a reference to the
fdw_scan_tlist entry. That would be fine if the fdw_exprs represented
expressions to be evaluated over the output of the foreign scan, but of
course they don't --- postgres_fdw uses fdw_exprs to compute values to be
sent to the remote end, instead. So we crash at runtime because there's
no slot to supply such output to the fdw_exprs.
I was able to make the crash go away by removing this statement from
set_foreignscan_references:
fscan->fdw_exprs = (List *)
fix_upper_expr(root,
(Node *) fscan->fdw_exprs,
itlist,
INDEX_VAR,
rtoffset);
and we still pass check-world without that (which means we lack test
coverage, because the minimum that should happen to fdw_exprs is
fix_scan_list :-(). But I do not think that's an acceptable route to
a patch, because it amounts to having the core code know what the FDW
is using fdw_exprs for, and we explicitly disclaim any assumptions about
that. fdw_exprs is specified to be processed the same as other
expressions in the same plan node, so I think this fix_upper_expr call
probably ought to stay like it is, even though it's not really the right
thing for postgres_fdw. It might be the right thing for other FDWs.
(One could imagine, perhaps, having some flag in the ForeignPlan
node that tells setrefs.c what to do. But that would be an API break
for FDWs, so it wouldn't be a back-patchable solution.)
(Actually, it seems to me that set_foreignscan_references is *already*
assuming too much about the semantics of these expressions in upper
plan nodes, so maybe we need to have a chat about that anyway.)
If we do leave it like this, then the only way for postgres_fdw to
avoid trouble is to not have any entries in fdw_exprs that exactly
match entries in fdw_scan_tlist. So that pretty much devolves back
to what I said before: don't ship values to the far end that are
just going to be fed back as-is. But now it's a correctness
requirement not just an optimization.
Thanks for taking care of this, as usual!
I haven't had anything to do with postgres_fdw's upper-relation-pushdown
code, so I am not sure why it's stupider than the other cases.
Thoughts anybody?
I worked on the ORDERED/FINAL-upperrel pushdown for PG12, but I don't
think that that's directly related to this issue, because this arises in
PG11 already. Maybe I'm missing something, but the
UPPERREL_GROUP_AGG-upperrel pushdown added in PG10 is likely to be
related to this. I'll work on this issue unless somebody wants to. But
I'll take a 10-day vocation from tomorrow, so I don't think I'll be able
to fix this in the next minor release...
Best regards,
Etsuro Fujita