On Fri, Feb 26, 2021 at 10:37 AM Amit Langote <amitlangot...@gmail.com> wrote: > > I realized that there is a race condition that will allow a concurrent > backend to invalidate a parallel plan (for insert into a partitioned > table) at a point in time when it's too late for plancache.c to detect > it. It has to do with how plancache.c locks the relations affected by > a cached query and its cached plan. Specifically, > AcquireExecutorLocks(), which locks the relations that need to be > locked before the plan could be considered safe to execute, does not > notice the partitions that would have been checked for parallel safety > when the plan was made. Given that AcquireExecutorLocks() only loops > over PlannedStmt.rtable and locks exactly the relations found there, > partitions are not locked. This means that a concurrent backend can, > for example, add an unsafe trigger to a partition before a parallel > worker inserts into it only to fail when it does. > > Steps to reproduce (tried with v19 set of patches): > > drop table if exists rp, foo; > create table rp (a int) partition by range (a); > create table rp1 partition of rp for values from (minvalue) to (0); > create table rp2 partition of rp for values from (0) to (maxvalue); > create table foo (a) as select generate_series(1, 1000000); > set plan_cache_mode to force_generic_plan; > set enable_parallel_dml to on; > deallocate all; > prepare q as insert into rp select * from foo where a%2 = 0; > explain analyze execute q; > > At this point, attach a debugger to this backend and set a breakpoint > in AcquireExecutorLocks() and execute q again: > > -- will execute the cached plan > explain analyze execute q; > > Breakpoint will be hit. Continue till return from the function and > stop. Start another backend and execute this: > > -- will go through, because no lock still taken on the partition > create or replace function make_table () returns trigger language > plpgsql as $$ begin create table bar(); return null; end; $$ parallel > unsafe; > create trigger ai_rp2 after insert on rp2 for each row execute > function make_table(); > > Back to the debugger, hit continue to let the plan execute. You > should see this error: > > ERROR: cannot start commands during a parallel operation > CONTEXT: SQL statement "create table bar()" > PL/pgSQL function make_table() line 1 at SQL statement parallel worker > > The attached patch fixes this, >
One thing I am a bit worried about this fix is that after this for parallel-mode, we will maintain partitionOids at two places, one in plannedstmt->relationOids and the other in plannedstmt->partitionOids. I guess you have to do that because, in AcquireExecutorLocks, we can't find which relationOids are corresponding to partitionOids, am I right? If so, why not just maintain them in plannedstmt->partitionOids and then make PlanCacheRelCallback consider it? Also, in standard_planner, we should add these partitionOids only for parallel-mode. Thoughts? -- With Regards, Amit Kapila.