Thanks Andres for the comments. On Thu, Sep 24, 2020 at 8:11 AM Andres Freund <and...@anarazel.de> wrote: > > > The design: > > I think it'd be good if you could explain a bit more why you think this > safe to do in the way you have done it. > > E.g. from a quick scroll through the patch, there's not even a comment > explaining that the only reason there doesn't need to be code dealing > with xid assignment because we already did the catalog changes to create > the table. >
Yes we do a bunch of catalog changes related to the created new table. We will have both the txn id and command id assigned when catalogue changes are being made. But, right after the table is created in the leader, the command id is incremented (CommandCounterIncrement() is called from create_ctas_internal()) whereas the txn id remains the same. The new command id is marked as GetCurrentCommandId(true); in intorel_startup, then the parallel mode is entered. The txn id and command id are serialized into parallel DSM, they are then available to all parallel workers. This is discussed in [1]. Few changes I have to make in the parallel worker code: set currentCommandIdUsed = true;, may be via a common API SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the extra command id sharing from the leader to workers. I will add a few comments in the upcoming patch related to the above info. > > But how does that work for SELECT INTO? Are you prohibiting > that? ... > In case of SELECT INTO, a new table gets created and I'm not prohibiting the parallel inserts and I think we don't need to. Thoughts? > > > Below things are still pending. Thoughts are most welcome: > > 1. How better we can lift the "cannot insert tuples in a parallel worker" > > from heap_prepare_insert() for only CTAS cases or for that matter parallel > > copy? How about having a variable in any of the worker global contexts and > > use that? Of course, we can remove this restriction entirely in case we > > fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY. > > And for the purpose of your question, we could then have a > table_insert_allow_parallel(TableInsertScan *); > or an additional arg to table_begin_insert(). > Removing "cannot insert tuples in a parallel worker" restriction from heap_prepare_insert() is a common problem for parallel inserts in general, i.e. parallel inserts in CTAS, parallel INSERT INTO SELECTs[1] and parallel copy[2]. It will be good if a common solution is agreed. > > > 3. Need to restrict parallel inserts, if CTAS tries to create temp/global > > tables as the workers will not have access to those tables. Need to analyze > > whether to allow parallelism if CTAS has prepared statements or with no > > data. > > In which case does CTAS not create a table? AFAICS, the table gets created in all the cases but the insertion of the data gets skipped if the user specifies "with no data" option in which case the select part is not even planned, and so the parallelism will also not be picked. > > You definitely need to > ensure that the table is created before your workers are started, and > there needs to be in a different CommandId. > Yeah, this is already being done. Table gets created in the leader(intorel_startup which gets called from dest->rStartup(dest in standard_ExecutorRun()) before entering the parallel mode. [1] https://www.postgresql.org/message-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c%2BUu3XU5VMv37Aw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1%2BkpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV%3Dg%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com