(2019/03/06 18:33), Amit Langote wrote:
I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
to use for partition routing targets. Specifically, the bug occurs when
UPDATE targets include a foreign partition that is locally modified (as
opposed to being modified directly on the remove server) and its
ResultRelInfo (called subplan result rel in the source code) is reused for
tuple routing:
-- setup
create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2base (a int check (a = 2));
create foreign table p2 partition of p for values in (2) server loopback
options (table_name 'p2base');
insert into p values (1), (2);
-- see in the plan that foreign partition p2 is locally modified
explain verbose update p set a = 2 from (values (1), (2)) s(x) where a =
s.x returning *;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────
Update on public.p (cost=0.05..236.97 rows=50 width=42)
Output: p1.a, "*VALUES*".column1
Update on public.p1
Foreign Update on public.p2
Remote SQL: UPDATE public.p2base SET a = $2 WHERE ctid = $1 RETURNING a
-> Hash Join (cost=0.05..45.37 rows=26 width=42)
Output: 2, p1.ctid, "*VALUES*".*, "*VALUES*".column1
Hash Cond: (p1.a = "*VALUES*".column1)
-> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
Output: p1.ctid, p1.a
-> Hash (cost=0.03..0.03 rows=2 width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Hash Join (cost=100.05..191.59 rows=24 width=42)
Output: 2, p2.ctid, "*VALUES*".*, "*VALUES*".column1
Hash Cond: (p2.a = "*VALUES*".column1)
-> Foreign Scan on public.p2 (cost=100.00..182.27 rows=2409
width=10)
Output: p2.ctid, p2.a
Remote SQL: SELECT a, ctid FROM public.p2base FOR UPDATE
-> Hash (cost=0.03..0.03 rows=2 width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-- as opposed to directly on remote side (because there's no local join)
explain verbose update p set a = 2 returning *;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────
Update on public.p (cost=0.00..227.40 rows=5280 width=10)
Output: p1.a
Update on public.p1
Foreign Update on public.p2
-> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
Output: 2, p1.ctid
-> Foreign Update on public.p2 (cost=100.00..191.90 rows=2730 width=10)
Remote SQL: UPDATE public.p2base SET a = 2 RETURNING a
(8 rows)
Running the first update query crashes:
update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning
tableoid::regclass, *;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The problem seems to occur because ExecSetupPartitionTupleRouting thinks
it can reuse p2's ResultRelInfo for tuple routing. In this case, it can't
be used, because its ri_FdwState contains information that will be needed
for postgresExecForeignUpdate to work, but it's still used today. Because
it's assigned to be used for tuple routing, its ri_FdwState will be
overwritten by postgresBeginForeignInsert that's invoked by the tuple
routing code using the aforementioned ResultRelInfo. Crash occurs when
postgresExecForeignUpdate () can't find the information it's expecting in
the ri_FdwState.
Agreed, as I said before in another thread.
The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
subplan result rel if its ri_FdwState is already set.
I'm not sure that is a good idea, because that requires to create a new
ResultRelInfo, which is not free; I think it requires a lot of work.
Another solution to avoid that would be to store the fmstate created by
postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
ri_FdwState, like the attached. This would not need any changes to the
core, and this would not cause any overheads either, IIUC. What do you
think about that?
I have attached 2 patches, one for PG 11 where the problem first appeared
and another for HEAD. The patch for PG 11 is significantly bigger due to
having to handle the complexities of mapping of subplan result rel indexes
to leaf partition indexes. Most of that complexity is gone in HEAD due to
3f2393edefa5, so the patch for HEAD is much simpler.
Thanks for the patches!
I've also added a
test in postgres_fdw.sql to exercise this test case.
Thanks again, but the test case you added works well without any fix:
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
returning *;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Update on public.utrtest
+ Output: remp.a, remp.b, "*VALUES*".column1
+ Foreign Update on public.remp
+ Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1
RETURNING a, b
+ Update on public.locp
+ -> Hash Join
+ Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+ Hash Cond: (remp.a = "*VALUES*".column1)
+ -> Foreign Scan on public.remp
+ Output: remp.b, remp.ctid, remp.a
+ Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+ -> Hash
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Hash Join
+ Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+ Hash Cond: (locp.a = "*VALUES*".column1)
+ -> Seq Scan on public.locp
+ Output: locp.b, locp.ctid, locp.a
+ -> Hash
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
In this test case, the foreign partition is updated before ri_FdwState
is overwritten by postgresBeginForeignInsert() that's invoked by the
tuple routing code, so it would work without any fix. So I modified
this test case as such.
Sorry for the long delay, again.
Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 7733,7738 **** update utrtest set a = 1 where a = 2 returning *;
--- 7733,7789 ----
1 | qux triggered !
(1 row)
+ delete from utrtest;
+ -- Change the definition of utrtest so that the local partition is updated
+ -- before the foreign partition
+ alter table utrtest detach partition remp;
+ drop foreign table remp;
+ alter table loct drop constraint loct_a_check;
+ alter table loct add check (a in (3));
+ create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+ alter table utrtest attach partition remp for values in (3);
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+ -- Check case where the foreign partition is a subplan target rel and is
+ -- updated with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+ QUERY PLAN
+ ------------------------------------------------------------------------------
+ Update on public.utrtest
+ Output: locp.a, locp.b, "*VALUES*".column1
+ Update on public.locp
+ Foreign Update on public.remp
+ Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+ -> Hash Join
+ Output: 3, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+ Hash Cond: (locp.a = "*VALUES*".column1)
+ -> Seq Scan on public.locp
+ Output: locp.b, locp.ctid, locp.a
+ -> Hash
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Hash Join
+ Output: 3, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+ Hash Cond: (remp.a = "*VALUES*".column1)
+ -> Foreign Scan on public.remp
+ Output: remp.b, remp.ctid, remp.a
+ Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+ -> Hash
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".*, "*VALUES*".column1
+ (24 rows)
+
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+ a | b | x
+ ---+-------------------+---
+ 3 | qux triggered ! | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered ! | 3
+ (3 rows)
+
drop trigger loct_br_insert_trigger on loct;
drop table utrtest;
drop table loct;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 185,190 **** typedef struct PgFdwModifyState
--- 185,194 ----
/* working memory context */
MemoryContext temp_cxt; /* context for per-tuple temporary data */
+
+ /* for update row movement, if subplan resultrel */
+ struct PgFdwModifyState *sub_fmstate; /* foreign-insert state, if
+ * created */
} PgFdwModifyState;
/*
***************
*** 1844,1851 **** postgresExecForeignInsert(EState *estate,
TupleTableSlot *slot,
TupleTableSlot *planSlot)
{
! return execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
slot, planSlot);
}
/*
--- 1848,1871 ----
TupleTableSlot *slot,
TupleTableSlot *planSlot)
{
! PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
! TupleTableSlot *rslot;
!
! /*
! * If the fmstate has sub_fmstate set, it means the foreign table is an
! * UPDATE subplan resultrel, in which case, the sub_fmstate is provided
! * for the INSERT operation on the table, whereas the fmstate is provided
! * for the UPDATE operation on the table; use the sub_fmstate
! */
! if (fmstate->sub_fmstate)
! resultRelInfo->ri_FdwState = fmstate->sub_fmstate;
! rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
slot, planSlot);
+ /* Revert that change */
+ if (fmstate->sub_fmstate)
+ resultRelInfo->ri_FdwState = fmstate;
+
+ return rslot;
}
/*
***************
*** 1983,1989 **** postgresBeginForeignInsert(ModifyTableState *mtstate,
retrieved_attrs != NIL,
retrieved_attrs);
! resultRelInfo->ri_FdwState = fmstate;
}
/*
--- 2003,2021 ----
retrieved_attrs != NIL,
retrieved_attrs);
! /*
! * If the given resultRelInfo already has PgFdwModifyState set, it means
! * the foreign table is an UPDATE subplan resultrel; in which case, store
! * the resulting state into the PgFdwModifyState.
! */
! if (resultRelInfo->ri_FdwState)
! {
! Assert(plan && plan->operation == CMD_UPDATE);
! Assert(resultRelInfo->ri_usesFdwDirectModify == false);
! ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->sub_fmstate = fmstate;
! }
! else
! resultRelInfo->ri_FdwState = fmstate;
}
/*
***************
*** 1998,2003 **** postgresEndForeignInsert(EState *estate,
--- 2030,2044 ----
Assert(fmstate != NULL);
+ /*
+ * If the fmstate has sub_fmstate set, it means the foreign table is an
+ * UPDATE subplan resultrel, in which case, the sub_fmstate is provided
+ * for the INSERT operation on the table, whereas the fmstate is provided
+ * for the UPDATE operation on the table; get the sub_fmstate
+ */
+ if (fmstate->sub_fmstate)
+ fmstate = fmstate->sub_fmstate;
+
/* Destroy the execution state */
finish_foreign_modify(fmstate);
}
***************
*** 3482,3487 **** create_foreign_modify(EState *estate,
--- 3523,3531 ----
Assert(fmstate->p_nums <= n_params);
+ /* Initialize sub execution state */
+ fmstate->sub_fmstate = NULL;
+
return fmstate;
}
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 2013,2018 **** update utrtest set a = 1 where a = 2 returning *;
--- 2013,2038 ----
-- The new values are concatenated with ' triggered !'
update utrtest set a = 1 where a = 2 returning *;
+ delete from utrtest;
+
+ -- Change the definition of utrtest so that the local partition is updated
+ -- before the foreign partition
+ alter table utrtest detach partition remp;
+ drop foreign table remp;
+ alter table loct drop constraint loct_a_check;
+ alter table loct add check (a in (3));
+ create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+ alter table utrtest attach partition remp for values in (3);
+
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+
+ -- Check case where the foreign partition is a subplan target rel and is
+ -- updated with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+
drop trigger loct_br_insert_trigger on loct;
drop table utrtest;