Re: Parallel INSERT (INTO ... SELECT ...)

Amit Kapila Fri, 25 Sep 2020 05:18:09 -0700

On Fri, Sep 25, 2020 at 10:02 AM Greg Nancarrow <[email protected]> wrote:
>
> Hi Andres,
>
> On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <[email protected]> wrote:
> >
>
>
> >> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
> >>       TupleDesc       toasttupDesc;
> >>       Datum           t_values[3];
> >>       bool            t_isnull[3];
> >> -     CommandId       mycid = GetCurrentCommandId(true);
> >> +     CommandId       mycid = GetCurrentCommandId(!IsParallelWorker());
> >>       struct varlena *result;
> >>       struct varatt_external toast_pointer;
> >>       union
> >
> >Hm? Why do we need this in the various places you have made this change?
>
> It's because for Parallel INSERT, we're assigning the same command-id
> to each worker up-front during worker initialization (the commandId
> has been retrieved by the leader and passed through to each worker)
> and "currentCommandIdUsed" has been set true. See the
> AssignCommandIdForWorker() function in the patch.
> If you see the code of GetCurrentCommandId(), you'll see it Assert
> that it's not being run by a parallel worker if the parameter is true.
> I didn't want to remove yet another check, without being able to know
> the context of the caller, because only for Parallel INSERT do I know
> that "currentCommandIdUsed was already true at the start of the
> parallel operation". See the comment in that function. Anyway, that's
> why I'm passing "false" to relevant GetCurrentCommandId() calls if
> they're being run by a parallel (INSERT) worker.
>


But we can tighten the condition in GetCurrentCommandId() such that it
Asserts for parallel worker only when currentCommandIdUsed is not set
before start of parallel operation. I also find these changes in the
callers of GetCurrentCommandId() quite adhoc and ugly even if they are
correct. Also, why we don't face a similar problems for parallel copy?

>
> >> diff --git a/src/backend/access/transam/varsup.c 
> >> b/src/backend/access/transam/varsup.c
> >> index a4944fa..9d3f100 100644
> >> --- a/src/backend/access/transam/varsup.c
> >> +++ b/src/backend/access/transam/varsup.c
> >> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
> >>       TransactionId xid;
> >>
> >>       /*
> >> -      * Workers synchronize transaction state at the beginning of each 
> >> parallel
> >> -      * operation, so we can't account for new XIDs after that point.
> >> -      */
> >> -     if (IsInParallelMode())
> >> -             elog(ERROR, "cannot assign TransactionIds during a parallel 
> >> operation");
> >> -
> >> -     /*
> >>        * During bootstrap initialization, we return the special bootstrap
> >>        * transaction id.
> >>        */
> >
> >Same thing, this code cannot just be allowed to be reachable. What
> >prevents you from assigning two different xids from different workers
> >etc?
>
> At least in the case of Parallel INSERT, the leader for the Parallel
> INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
> through and assigned to each of the workers during their
> initialization (so they are assigned the same xid).
>

So are you facing problems in this area because we EnterParallelMode
before even assigning the xid in the leader? Because I don't think we
should ever reach this code in the worker. If so, there are two
possibilities that come to my mind (a) assign xid in leader before
entering parallel mode or (b) change the check so that we don't assign
the new xid in workers. In this case, I am again wondering how does
parallel copy dealing this?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

Reply via email to