Re: Parallel INSERT (INTO ... SELECT ...)

Amit Kapila Tue, 23 Feb 2021 20:12:22 -0800

On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow <[email protected]> wrote:
>
> On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila <[email protected]> wrote:
> >
> > > But the non-parallel plan was chosen (instead of a parallel plan)
> > > because of parallel-safety checks on the partitions, which found
> > > attributes of the partitions which weren't parallel-safe.
> > > So it's not so clear to me that the dependency doesn't exist - the
> > > non-parallel plan does in fact depend on the state of the partitions.
> > >
> >
> > Hmm, I think that is not what we can consider as a dependency.
> >
>
> Then if it's not a dependency, then we shouldn't have to check the
> attributes of the partitions for parallel-safety, to determine whether
> we must use a non-parallel plan (or can use a parallel plan).
> Except, of course, we do have to ...
>


I don't think the plan-dependency and checking for parallel-safety are
directly related.

> > > I know you're suggesting to reduce plan-cache invalidation by only
> > > recording a dependency in the parallel-plan case, but I've yet to see
> > > that in the existing code, and in fact it seems to be inconsistent
> > > with the behaviour I've tested so far (one example given prior, but
> > > will look for more).
> > >
> >
> > I don't see your example matches what you are saying as in it the
> > regular table exists in the plan whereas for the case we are
> > discussing partitions doesn't exist in the plan. Amit L. seems to have
> > given a correct explanation [1] of why we don't need to invalidate for
> > non-parallel plans which match my understanding.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CA%2BHiwqFKJfzgBbkg0i0Fz_FGsCiXW-Fw0tBjdsaUbNbpyv0JhA%40mail.gmail.com
> >
>
> Amit L's explanation was:
>
>     I may be wrong but it doesn't seem to me that the possibility of
>     constructing a better plan due to a given change is enough reason for
>     plancache.c to invalidate plans that depend on that change. AIUI,
>     plancache.c only considers a change interesting if it would *break* a
>     Query or a plan.
>
>     So in this case, a non-parallel plan may be slower, but it isn't
>     exactly rendered *wrong* by changes that make a parallel plan
>     possible.
>
>
> This explanation doesn't seem to match existing planner behavior
> AFAICS, and we should try to be consistent with existing behavior
> (rather than doing something different, for partitions specifically in
> this case).
>

I still think it matches. You have missed the important point in your
example and explanation which Amit L and I am trying to explain to
you. See below.

> Using a concrete example, to avoid any confusion, consider the
> following SQL (using unpatched Postgres, master branch):
>
>
> -- encourage use of parallel plans, where possible
> set parallel_setup_cost=0;
> set parallel_tuple_cost=0;
> set min_parallel_table_scan_size=0;
> set max_parallel_workers_per_gather=4;
>
> create or replace function myfunc() returns boolean as $$
>     begin
>         return true;
>     end;
> $$ language plpgsql parallel unsafe immutable;
>
> create table mytable(x int);
> insert into mytable(x) values(generate_series(1,10));
>
> prepare q as select * from mytable, myfunc();
> explain execute q;
>
> -- change myfunc to be parallel-safe, to see the effect
> -- on the planner for the same select
> create or replace function myfunc() returns boolean as $$
>     begin
>         return true;
>     end;
> $$ language plpgsql parallel safe immutable;
>
> explain execute q;
>
>
> Here a function referenced in the SELECT statement is changed from
> parallel-unsafe to parallel-safe, to see the effect on plancache.
> According to your referenced explanation, that shouldn't be considered
> an "interesting" change by plancache.c, as it wouldn't "break" the
> previously planned and cached (non-parallel) query - the existing
> non-parallel plan could and should be used, as it still would execute
> OK, even if slower. Right?
>

No that is not what I said or maybe you have misunderstood. In your
example, if you use the verbose option, then you will see the output
as below.

postgres=# explain (verbose) execute q;
                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on public.mytable  (cost=0.00..35.50 rows=2550 width=5)
   Output: mytable.x, true
(2 rows)

Here, you can see that Plan depends on function (it's return value),
so it needs to invalidated when the function changes. To, see it in a
bit different way, if you change your function as below then, you can
clearly see FunctionScan in the plan:

create or replace function myfunc(c1 int) returns int as $$
    begin
        return c1 * 10;
    end;
$$ language plpgsql parallel unsafe;

postgres=# prepare q as select * from mytable, myfunc(2);
PREPARE
postgres=# explain select * from mytable, myfunc(2);
                           QUERY PLAN
-----------------------------------------------------------------
 Nested Loop  (cost=0.25..61.26 rows=2550 width=8)
   ->  Function Scan on myfunc  (cost=0.25..0.26 rows=1 width=4)
   ->  Seq Scan on mytable  (cost=0.00..35.50 rows=2550 width=4)
(3 rows)

> BUT - for the above example, it DOES cause the query to be replanned,
> and it then uses a parallel plan, instead of keeping the original
> non-parallel plan.
> This does not match the explanation about how plancache works.
>

As explained above and as far as I can understand it matches Amit L's
explanation because the plan depends on the function output.

>
>
> So what I am saying is why should this behavior be different for our
> partition case?
> Doing something different for partitions would be inconsistent with
> this behavior, would it not?
>

No, please show an example of the Insert case where the plan has
reference to partitions. If that happens then we should add those as a
dependency but AFAICT that is not the case here.


-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

Reply via email to