Re: [HACKERS] parallelize queries containing initplans

Amit Kapila Mon, 13 Nov 2017 21:00:53 -0800

On Tue, Nov 14, 2017 at 2:39 AM, Robert Haas <[email protected]> wrote:
> On Sat, Nov 11, 2017 at 7:19 AM, Amit Kapila <[email protected]> wrote:
>> I have tried to follow the practice we have used for param extern
>> params (SerializeParamList is in params.c) and most of the handling of
>> initplans is done in nodeSubplan.c, so I choose to keep the newly
>> added functions there.  However, I think keeping it in execParallel.c
>> is also okay as we do it for serialize plan.
>
> To me it feels like there is only a very loose coupling between this
> code and what's currently in nodeSubplan.c, but maybe I'm wrong.  I am
> also not sure execParallel.c is the perfect place either; we might
> eventually split execParallel.c into multiple files if it accumulates
> too many things that are too different from each other.
>


Another possibility is subselect.c, because it contains initplan
stuff, however, most of the things are related to planning, so not
sure.  I think nodeSubplan.c won't be a bad choice as it is mentioned
in the file header that it is concerned about initplan stuff.

>> I think it would need some work in execution as well because the size
>> won't be fixed in that case for varchar type of params.  We might end
>> up with something different as well.
>
> Hmm, true.  But don't we also have that problem with the patch as
> written?  I mean, didn't we determine at some point that the value of
> an InitPlan can change, if the initplan depends on a parameter that is
> correlated but not directly correlated?  The existence of
> ExecReScanSetParamPlan() seems to suggest that this is indeed
> possible, and that means that the parameter could be reevaluated and
> evaluate, on the second time through, to a value that requires more
> storage than before.  That would be a problem, because
> ExecParallelReInitializeDSM reuses the old DSM.
>

I think this would have been a matter of concern if we use initplans
below Gather/Gather Merge.  The patch uses initplans which are between
current query level and root.  So, I think we don't need to reevaluate
such parameters.  Let us try to see it via example:


                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=62.08..62.08 rows=1 width=8)
   InitPlan 1 (returns $1)
     ->  Finalize Aggregate  (cost=20.64..20.65 rows=1 width=8)
           ->  Gather  (cost=20.63..20.64 rows=2 width=8)
                 Workers Planned: 2
                 ->  Partial Aggregate  (cost=20.63..20.64 rows=1 width=8)
                       ->  Parallel Seq Scan on t3  (cost=0.00..18.50
rows=850 width=4)
   ->  Hash Join  (cost=20.75..41.42 rows=1 width=4)
         Hash Cond: (t1.j = t2.j)
         ->  Gather  (cost=0.00..20.63 rows=10 width=12)
               Workers Planned: 2
               Params Evaluated: $1
               ->  Parallel Seq Scan on t1  (cost=0.00..20.63 rows=4 width=12)
                     Filter: (k = $1)
         ->  Hash  (cost=20.63..20.63 rows=10 width=8)
               ->  Gather  (cost=0.00..20.63 rows=10 width=8)
                     Workers Planned: 2
                     Params Evaluated: $1
                     ->  Parallel Seq Scan on t2  (cost=0.00..20.63
rows=4 width=8)
                           Filter: (k = $1)
(20 rows)


Now, here even if initplan would have been an undirect correlated
plan, it wouldn't have been a problem, because we don't need to
reevaluate it for Gather node below Hash.

Am I missing something?  Do you have some test or shape of the plan in
mind which can cause a problem?

>>>  I broke a lot of long lines in your version of
>>> the patch into multiple lines; please try to be attentive to this
>>> issue when writing patches in general, as it is a bit tedious to go
>>> through and insert line breaks in many places.
>>
>> Okay, but I sometimes rely on pgindent for such things as for few
>> things it becomes difficult to decide which way it will be better.
>
> pgindent doesn't in general break long lines.  There are a few cases
> where it does, like reflowing comments.  But mostly you have to do
> that manually.  For instance, if you write a =
> verylongfunctionname(longargument(thing), stuff(thing), foobar(thing,
> thing, thing)) it's going to keep all that on one line.  If you insert
> line breaks between the arguments it will keep them, though.  I think
> it's worth doing something like:
>
> git diff | awk 'length($0)>81'
>
> on your patches before you submit them.  If you see things in there
> that can be reformatted to make the long lines shorter, do it.
>

Okay, point noted.

>> I have fixed the first two in attached patch and left the last one as
>> I was not sure what you have in mind
>
> Oh, yeah, that should be changed too, something like "Serialize
> PARAM_EXEC parameters."  Sorry, I thought I caught all of the
> references to initplans specifically, but I guess I missed a few.
>

No issues, I can go through it once more to change the comments
wherever initplans is used after we settle on the above discussion.
If we decide that initplan and other PARAM_EXEC params need different
treatment, then we might want to retain initplans in the comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] parallelize queries containing initplans

Reply via email to