On 30 August 2018 at 21:29, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > Attached updated patches, with 0002 containing the changes mentioned above.
Here's my first pass review on this: 0001: 1. I think the following paragraph should probably mention something about some performance difference between the two methods: <para> Constraint exclusion works in a very similar way to partition pruning, except that it uses each table's <literal>CHECK</literal> constraints — which gives it its name — whereas partition pruning uses the table's partition bounds, which exist only in the case of declarative partitioning. Another difference is that constraint exclusion is only applied at plan time; there is no attempt to remove partitions at execution time. </para> Perhaps tagging on. "Constraint exclusion is also a much less efficient way of eliminating unneeded partitions as metadata for each partition must be loaded in the planner before constraint exclusion can be applied. This is not a requirement for partition pruning." 2. I think "rootrel" should be named "targetpart" in: + RelOptInfo *rootrel = root->simple_rel_array[root->parse->resultRelation]; 3. Why does the following need to list_copy()? + List *saved_join_info_list = list_copy(root->join_info_list); 4. Is the "root->parse->commandType != CMD_INSERT" required in: @@ -181,13 +185,30 @@ make_one_rel(PlannerInfo *root, List *joinlist) /* * Generate access paths for the entire join tree. + * + * If we're doing this for an UPDATE or DELETE query whose target is a + * partitioned table, we must do the join planning against each of its + * leaf partitions instead. */ - rel = make_rel_from_joinlist(root, joinlist); + if (root->parse->resultRelation && + root->parse->commandType != CMD_INSERT && + root->simple_rel_array[root->parse->resultRelation] && + root->simple_rel_array[root->parse->resultRelation]->part_scheme) + { Won't the simple_rel_array entry for the resultRelation always be NULL for an INSERT? 5. In regards to: + /* + * Hack to make the join planning code believe that 'partrel' can + * be joined against. + */ + partrel->reloptkind = RELOPT_BASEREL; Have you thought about other implications of join planning for other member rels, for example, equivalence classes and em_is_child? 6. It would be good to not have to rt_fetch the same RangeTblEntry twice here: @@ -959,7 +969,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse, * needs special processing, else go straight to grouping_planner. */ if (parse->resultRelation && - rt_fetch(parse->resultRelation, parse->rtable)->inh) + rt_fetch(parse->resultRelation, parse->rtable)->inh && + rt_fetch(parse->resultRelation, parse->rtable)->relkind != + RELKIND_PARTITIONED_TABLE) inheritance_planner(root); 7. Why don't you just pass the Parse into the function as a parameter instead of overwriting PlannerInfo's copy in: + root->parse = partition_parse; + partitionwise_adjust_scanjoin_target(root, child_rel, + subroots, + partitioned_rels, + resultRelations, + subpaths, + WCOLists, + returningLists, + rowMarks); + /* Restore the Query for processing the next partition. */ + root->parse = parse; 8. Can you update the following comment to mention why you're not calling add_paths_to_append_rel for this case? @@ -6964,7 +7164,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root, } /* Build new paths for this relation by appending child paths. */ - if (live_children != NIL) + if (live_children != NIL && + !(rel->reloptkind == RELOPT_BASEREL && + rel->relid == root->parse->resultRelation)) add_paths_to_append_rel(root, rel, live_children); 9. The following should use >= 0, not > 0 + while ((relid = bms_next_member(child_rel->relids, relid)) > 0) 0002: 10. I think moving the PlannerInfo->total_table_pages calculation needs to be done in its own patch. This is a behavioural change where we no longer count pruned relations in the calculation which can result in plan changes. I posted a patch in [1] to fix this as a bug fix as I think the current code is incorrect. My patch also updates the first paragraph of the comment. You've not done that. 11. "pruning" + * And do prunin. Note that this adds AppendRelInfo's of only the 12. It's much more efficient just to do bms_add_range() outside the loop in: + for (i = 0; i < rel->nparts; i++) + { + rel->part_rels[i] = build_partition_rel(root, rel, + rel->part_oids[i]); + rel->live_parts = bms_add_member(rel->live_parts, i); + } 13. In set_append_rel_size() the foreach(l, root->append_rel_list) loop could be made to loop over RelOptInfo->live_parts instead which would save having to skip over AppendRelInfos that don't belong to this parent. You'd need to make live_parts more general purpose and also use it to mark children of inheritance parents. 14. I think you can skip the following if both are NULL. You could likely add more smarts for different join types, but I think you should at least skip when both are NULL. Perhaps the loop could be driven off of bms_intersec of the two Rel's live_parts? + if (child_rel1 == NULL) + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts); + if (child_rel2 == NULL) + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts); 15. The following is not required when append_rel_array was previously NULL. + MemSet(root->append_rel_array + root->simple_rel_array_size, + 0, sizeof(AppendRelInfo *) * num_added_parts); 16. I don't think scan_all_parts is worth the extra code. The cost of doing bms_num_members is going to be pretty small in comparison to building paths for and maybe doing a join search for all partitions. + num_added_parts = scan_all_parts ? rel->nparts : + bms_num_members(partindexes); In any case, you've got a bug in prune_append_rel_partitions() where you're setting scan_all_parts to true instead of returning when contradictory == true. 17. lockmode be of type LOCKMODE, not int. + Oid childOID, int lockmode, 18. Comment contradicts the code. + /* Open rel if needed; we already have required locks */ + if (childOID != parentOID) + { + childrel = heap_open(childOID, lockmode); I think you should be using NoLock here. 19. Comment contradicts the code. + /* Close child relations, but keep locks */ + if (childOID != parentOID) + { + Assert(childrel != NULL); + heap_close(childrel, lockmode); + } 20. I assume translated_vars can now be NIL due to build_dummy_partition_rel() not setting it. - if (var->varlevelsup == 0 && appinfo) + if (var->varlevelsup == 0 && appinfo && appinfo->translated_vars) It might be worth writing a comment to explain that, otherwise it's not quite clear why you're doing this. 21. Unrelated change; Assert(relid > 0 && relid < root->simple_rel_array_size); + 22. The following comment mentions that Oids are copied, but that does not happen in the code. + /* + * For partitioned tables, we just allocate space for RelOptInfo's. + * pointers for all partitions and copy the partition OIDs from the + * relcache. Actual RelOptInfo is built for a partition only if it is + * not pruned. + */ The Oid copying already happened during get_relation_info(). 23. Traditionally translated_vars populated with a sequence of Vars in the same order to mean no translation. Here you're changing how that works: + /* leaving translated_vars to NIL to mean no translation needed */ This seems to be about the extent of your documentation on this, which is not good enough. 24. "each" -> "reach"? ... Actually, I don't understand the comment. In a partitioned hierarchy, how can the one before the top-level partitioned table not be a partitioned table? /* * Keep moving up until we each the parent rel that's not a * partitioned table. The one before that one would be the root * parent. */ 25. "already" + * expand_inherited_rtentry alreay locked all partitions, so pass 26. Your changes to make_partitionedrel_pruneinfo() look a bit broken. You're wrongly assuming live_parts is the same as present_parts. If a CHECK constraint eliminated the partition then those will be present in live_parts but won't be part of the Append/MergeAppend subplans. You might be able to maintain some of this optimisation by checking for dummy rels inside the loop, but you're going to need to put back the code that sets present_parts. + present_parts = bms_copy(subpart->live_parts); 27. Comment contradicts the code: + Bitmapset *live_parts; /* unpruned parts; NULL if all are live */ in add_rel_partitions_to_query() you're doing: + if (scan_all_parts) + { + for (i = 0; i < rel->nparts; i++) + { + rel->part_rels[i] = build_partition_rel(root, rel, + rel->part_oids[i]); + rel->live_parts = bms_add_member(rel->live_parts, i); + } + } so the NULL part seems untrue. 28. Missing comments: + TupleDesc tupdesc; + Oid reltype; 29. The comment for prune_append_rel_partitions claims it "Returns RT indexes", but that's not the case, per: -Relids -prune_append_rel_partitions(RelOptInfo *rel) +void +prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel) 0003: 30. 2nd test can be tested inside the first test to remove redundant partition check. - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); /* * Check that there's at least one descendant, else treat as no-child * case. This could happen despite above has_subclass() check, if table * once had a child but no longer does. */ - if (list_length(inhOIDs) < 2) + if (rte->relkind != RELKIND_PARTITIONED_TABLE && list_length(inhOIDs) < 2) { 31. The following code is wrong: + /* Determine the correct lockmode to use. */ + if (rootRTindex == root->parse->resultRelation) + lockmode = RowExclusiveLock; + else if (rootrc && RowMarkRequiresRowShareLock(rootrc->markType)) + lockmode = RowShareLock; + else + lockmode = AccessShareLock; rootRTIndex remains at 0 if there are no row marks and resultRelation will be 0 for SELECT queries, this means you'll use a RowExclusiveLock for a SELECT instead of an AccessShareLock. Why not just check the lockmode of the parent and use that? [1] https://commitfest.postgresql.org/19/1768/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services