On Tue, Sep 29, 2020 at 8:27 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4...@gmail.com> wrote: > > > > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > > I further checked on full txn id and command id. Yes, these are > > > getting passed to workers via InitializeParallelDSM() -> > > > SerializeTransactionState(). I tried to summarize what we need to do > > > in case of parallel inserts in general i.e. parallel COPY, parallel > > > inserts in INSERT INTO and parallel inserts in CTAS. > > > > > > In the leader: > > > GetCurrentFullTransactionId() > > > GetCurrentCommandId(true) > > > EnterParallelMode(); > > > InitializeParallelDSM() --> calls SerializeTransactionState() > > > (both full txn id and command id are serialized into parallel DSM) > > > > > > In the workers: > > > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > > > full txn id and command id are restored into workers' > > > CurrentTransactionState->fullTransactionId and currentCommandId) > > > If the parallel workers are meant for insertions, then we need to set > > > currentCommandIdUsed = true; Maybe we can lift the assert in > > > GetCurrentCommandId(), if we don't want to touch that function, then > > > we can have a new function GetCurrentCommandidInWorker() whose > > > functionality will be same as GetCurrentCommandId() without the > > > Assert(!IsParallelWorker());. > > > > > > Am I missing something? > > > > > > If the above points are true, we might have to update the parallel > > > copy patch set, test the use cases and post separately in the parallel > > > copy thread in coming days. > > > > > > > Hi Bharath, > > > > I pretty much agree with your above points. > > > > I've attached an updated Parallel INSERT...SELECT patch, that: > > - Only uses existing transaction state serialization support for > > transfer of xid and cid. > > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting > > currentCommandIdUsed=true at the start of a parallel operation (used > > for Parallel INSERT case, where we know the currentCommandId has been > > assigned to the worker at the start of the parallel operation). > > - Tweaks the Assert condition within "used=true" parameter case in > > GetCurrentCommandId(), so that it only fires if in a parallel worker > > and currentCommandId is false - refer to the updated comment in that > > function. > > - Does not modify any existing GetCurrentCommandId() calls. > > - Does not remove any existing parallel-related asserts/checks, except > > for the "cannot insert tuples in a parallel worker" error in > > heap_prepare_insert(). I am still considering what to do with the > > original error-check here. > > [- Does not yet cater for other exclusion cases that you and Vignesh > > have pointed out] > > > > This patch is mostly a lot cleaner, but does contain a possible ugly > > hack, in that where it needs to call GetCurrentFullTransactionId(), it > > must temporarily escape parallel-mode (recalling that parallel-mode is > > set true right at the top of ExectePlan() in the cases of Parallel > > INSERT/SELECT). > > I think you still need to work on the costing part, basically if we > are parallelizing whole insert then plan is like below > > -> Gather > -> Parallel Insert > -> Parallel Seq Scan > > That means the tuple we are selecting via scan are not sent back to > the gather node, so in cost_gather we need to see if it is for the > INSERT then there is no row transferred through the parallel queue > that mean we need not to pay any parallel tuple cost.
I just looked into the parallel CTAS[1] patch for the same thing, and I can see in that patch it is being handled. [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com