On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow <gregn4...@gmail.com> wrote: > > Posting an updated set of patches that includes Amit Langote's patch > to the partition tracking scheme... >
Few comments: ============== 1. "Prior to entering parallel-mode for execution of INSERT with parallel SELECT, a TransactionId is acquired and assigned to the current transaction state which is then serialized in the parallel DSM for the parallel workers to use." This point in the commit message seems a bit misleading to me because IIUC we need to use transaction id only in the master backend for the purpose of this patch. And, we are doing this at an early stage because we don't allow to allocate it in parallel-mode. I think here we should mention in some way that this has a disadvantage that if the underneath select doesn't return any row then this transaction id allocation would not be of use. However, that shouldn't happen in practice in many cases. But, if I am missing something and this is required by parallel workers then we can ignore what I said. 2. /* + * Prepare for entering parallel mode by assigning a + * FullTransactionId, to be included in the transaction state that is + * serialized in the parallel DSM. + */ + (void) GetCurrentTransactionId(); + } Similar to the previous comment this also seems to indicate that we require TransactionId for workers. If that is not the case then this comment also needs to be modified. 3. static int common_prefix_cmp(const void *a, const void *b); - /***************************************************************************** Spurious line removal. 4. * Assess whether it's feasible to use parallel mode for this query. We * can't do this in a standalone backend, or if the command will try to - * modify any data, or if this is a cursor operation, or if GUCs are set - * to values that don't permit parallelism, or if parallel-unsafe - * functions are present in the query tree. + * modify any data using a CTE, or if this is a cursor operation, or if + * GUCs are set to values that don't permit parallelism, or if + * parallel-unsafe functions are present in the query tree. This comment change is not required because this is quite similar to what we do for CTAS. Your further comment changes in this context are sufficient. 5. + (IsModifySupportedInParallelMode(parse->commandType) && + is_parallel_possible_for_modify(parse))) && I think it would be better if we move the check IsModifySupportedInParallelMode inside is_parallel_possible_for_modify. Also, it might be better to name this function as is_parallel_allowed_for_modify. 6. @@ -260,6 +265,21 @@ set_plan_references(PlannerInfo *root, Plan *plan) */ add_rtes_to_flat_rtable(root, false); + /* + * If modifying a partitioned table, add its parallel-safety-checked + * partitions too to glob->relationOids, to register them as plan + * dependencies. This is only really needed in the case of a parallel + * plan, so that if parallel-unsafe properties are subsequently defined + * on the partitions, the cached parallel plan will be invalidated and + * a non-parallel plan will be generated. + */ + if (IsModifySupportedInParallelMode(root->parse->commandType)) + { + if (glob->partitionOids != NIL && glob->parallelModeNeeded) + glob->relationOids = + list_concat(glob->relationOids, glob->partitionOids); + } + Isn't it possible to add these partitionOids in set_plan_refs with the T_Gather(Merge) node handling? That looks like a more natural place to me, if that is feasible then we don't need parallelModeNeeded check and maybe we don't need to even check IsModifySupportedInParallelMode but I am less sure of the second check requirement. 7. +#include "access/table.h" +#include "access/xact.h" #include "access/transam.h" +#include "catalog/pg_class.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -24,6 +27,8 @@ #include "optimizer/planmain.h" #include "optimizer/planner.h" #include "optimizer/tlist.h" +#include "parser/parsetree.h" +#include "partitioning/partdesc.h" I think apart from xact.h, we don't need new additional includes. 8. I see that in function target_rel_max_parallel_hazard, we don't release the lock on the target table after checking parallel-safety but then in function target_rel_max_parallel_hazard_recurse, we do release the lock on partition tables after checking their parallel-safety. Can we add some comments to explain these two cases? 9. I noticed that the tests added for the first patch in v18-0002-Parallel-SELECT-for-INSERT-INTO-.-SELECT-tests-and-doc take even more time than select_parallel. I think we should see if we can reduce the timing of this test. I haven't studied it in detail but maybe some Inserts execution can be avoided. In some cases like below just checking the plan should be sufficient. I think you can try to investigate and see how much we can reduce it without affecting on code-coverage of newly added code. +-- +-- Parallel unsafe column default, should not use a parallel plan +-- +explain (costs off) insert into testdef(a,c,d) select a,a*4,a*8 from test_data; + QUERY PLAN +----------------------------- + Insert on testdef + -> Seq Scan on test_data +(2 rows) + +insert into testdef(a,c,d) select a,a*4,a*8 from test_data; +select * from testdef order by a; + a | b | c | d +----+---+----+---- + 1 | 5 | 4 | 8 + 2 | 5 | 8 | 16 + 3 | 5 | 12 | 24 + 4 | 5 | 16 | 32 + 5 | 5 | 20 | 40 + 6 | 5 | 24 | 48 + 7 | 5 | 28 | 56 + 8 | 5 | 32 | 64 + 9 | 5 | 36 | 72 + 10 | 5 | 40 | 80 +(10 rows) + -- With Regards, Amit Kapila.