On Fri, Feb 26, 2021 at 10:37 AM Amit Langote <amitlangot...@gmail.com> wrote:
>
> I realized that there is a race condition that will allow a concurrent
> backend to invalidate a parallel plan (for insert into a partitioned
> table) at a point in time when it's too late for plancache.c to detect
> it.  It has to do with how plancache.c locks the relations affected by
> a cached query and its cached plan.  Specifically,
> AcquireExecutorLocks(), which locks the relations that need to be
> locked before the plan could be considered safe to execute, does not
> notice the partitions that would have been checked for parallel safety
> when the plan was made.  Given that AcquireExecutorLocks() only loops
> over PlannedStmt.rtable and locks exactly the relations found there,
> partitions are not locked.  This means that a concurrent backend can,
> for example, add an unsafe trigger to a partition before a parallel
> worker inserts into it only to fail when it does.
>
> Steps to reproduce (tried with v19 set of patches):
>
> drop table if exists rp, foo;
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (minvalue) to (0);
> create table rp2 partition of rp for values from (0) to (maxvalue);
> create table foo (a) as select generate_series(1, 1000000);
> set plan_cache_mode to force_generic_plan;
> set enable_parallel_dml to on;
> deallocate all;
> prepare q as insert into rp select * from foo where a%2 = 0;
> explain analyze execute q;
>
> At this point, attach a debugger to this backend and set a breakpoint
> in AcquireExecutorLocks() and execute q again:
>
> -- will execute the cached plan
> explain analyze execute q;
>
> Breakpoint will be hit.  Continue till return from the function and
> stop. Start another backend and execute this:
>
> -- will go through, because no lock still taken on the partition
> create or replace function make_table () returns trigger language
> plpgsql as $$ begin create table bar(); return null; end; $$ parallel
> unsafe;
> create trigger ai_rp2 after insert on rp2 for each row execute
> function make_table();
>
> Back to the debugger, hit continue to let the plan execute.  You
> should see this error:
>
> ERROR:  cannot start commands during a parallel operation
> CONTEXT:  SQL statement "create table bar()"
> PL/pgSQL function make_table() line 1 at SQL statement parallel worker
>
> The attached patch fixes this,
>

One thing I am a bit worried about this fix is that after this for
parallel-mode, we will maintain partitionOids at two places, one in
plannedstmt->relationOids and the other in plannedstmt->partitionOids.
I guess you have to do that because, in AcquireExecutorLocks, we can't
find which relationOids are corresponding to partitionOids, am I
right? If so, why not just maintain them in plannedstmt->partitionOids
and then make PlanCacheRelCallback consider it? Also, in
standard_planner, we should add these partitionOids only for
parallel-mode.

Thoughts?

-- 
With Regards,
Amit Kapila.


Reply via email to