On Sun, 31 Jan 2021 at 11:42, Tom Lane <t...@sss.pgh.pa.us> wrote: > > For simplicity of review I divided the patch into two parts. > 0001 revises make_partition_pruneinfo() and children to identify > the relevant parent partitions for themselves, which is not too > hard to do by chasing up the child-to-parent AppendRelInfo links. > Not formerly documented, AFAICT, is that we want to collect just > the parent partitions that are between the Append path's own rel > (possibly equal to it) and the subpaths' rels. I'd first tried > to code this by using the top_parent_relids and part_rels[] links > in the RelOptInfos, but that turns out not to work. We might > ascend to a top parent that's above the Append's rel (if considering > an appendpath for a sub-partition, which happens in partitionwise > join). We could also descend to a child at or below some subpath > level, since there are also cases where subpaths correspond to > unflattened non-leaf partitions. Either of those things result > in failures. But once you wrap your head around handling those > restrictions, it's quite simple.
I had a look at this one and it all makes sense and the logic for obtaining the lineage of parent partitioned tables seems fine. What I can't understand is why you changed to a List-of-Lists rather than a List-of-Relids. This makes the patch both bigger than it needs to be and the processing quite a bit less efficient. For example, in make_partition_pruneinfo() when you're walking up to the top-level target partitioned table you must lcons() each new list member to ensure the children always come after the parents. These chains are likely to be short so the horrible overheads of the memmove() in lcons() won't cost that much, but there's more to it. The other seemingly needlessly slow part is in the list_concat_unique_ptr() call. This needs to be done for every subpath in the Append/Merge append. It would be good to get rid of that. Given, the list are most likely going to be small, but that's still a quadratic function, so it seems like a good idea to try to avoid using it if there is another way to do it. The memory allocations could also be more efficient for Relids rather than Lists. Since we're working up from the child to the parent in the lineage calculation code in make_partition_pruneinfo(), we'll always allocate the Bitmapset to the correct size right away, rather than possibly having to increase the size if the next RT index were not to fit in the current number of words. Of course, with a List-of-Lists, not every lcons() would require a new allocation, but there's an above zero chance that it might. I've attached a version of your 0001 patch which just maintains using a List-of-Relids. This shrinks the diff down about 3kb. Parent RT indexes are guaranteed to be lower than their children RT indexes, so it's pretty simple to figure out the target RT index by just looking at the lowest set bit. Doing it this way also simplifies things as add_part_rel_list() no longer must insist that the sublists are in parent-to-child order. David
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index 42c7c5f554..447bc2f78e 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -138,6 +138,7 @@ typedef struct PruneStepResult } PruneStepResult; +static List *add_part_rel_list(List *partrellists, Relids partrels); static List *make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, int *relid_subplan_map, @@ -213,59 +214,94 @@ static void partkey_datum_from_expr(PartitionPruneContext *context, * * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list * of scan paths for its child rels. - * - * 'partitioned_rels' is a List containing Lists of relids of partitioned - * tables (a/k/a non-leaf partitions) that are parents of some of the child - * rels. Here we attempt to populate the PartitionPruneInfo by adding a - * 'prune_infos' item for each sublist in the 'partitioned_rels' list. - * However, some of the sets of partitioned relations may not require any - * run-time pruning. In these cases we'll simply not include a 'prune_infos' - * item for that set and instead we'll add all the subplans which belong to - * that set into the PartitionPruneInfo's 'other_subplans' field. Callers - * will likely never want to prune subplans which are mentioned in this field. - * - * 'prunequal' is a list of potential pruning quals. + * 'prunequal' is a list of potential pruning quals (i.e., restriction + * clauses that are applicable to the appendrel). */ PartitionPruneInfo * make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, - List *subpaths, List *partitioned_rels, + List *subpaths, List *unused_partitioned_rels, List *prunequal) { PartitionPruneInfo *pruneinfo; Bitmapset *allmatchedsubplans = NULL; + List *partrellists; + List *prunerelinfos; int *relid_subplan_map; ListCell *lc; - List *prunerelinfos; int i; /* - * Construct a temporary array to map from planner relids to subplan - * indexes. For convenience, we use 1-based indexes here, so that zero - * can represent an un-filled array entry. + * Scan the subpaths to see which ones are scans of partition child + * relations, and make lists of their parent partitioned rels. (We must + * restrict the parent partitioned rels to be parentrel or children of + * parentrel, otherwise we couldn't translate prunequal to match.) Also + * construct a temporary array to map from partition-child-relation relid + * to the index in 'subpaths' of the scan plan for that partition. (Use + * of "subplan" rather than "subpath" is a bit of a misnomer, but we'll + * let it stand.) For convenience, we use 1-based indexes here, so that + * zero can represent an un-filled array entry. */ + partrellists = NIL; relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size); - /* - * relid_subplan_map maps relid of a leaf partition to the index in - * 'subpaths' of the scan plan for that partition. - */ i = 1; foreach(lc, subpaths) { Path *path = (Path *) lfirst(lc); RelOptInfo *pathrel = path->parent; - Assert(IS_SIMPLE_REL(pathrel)); - Assert(pathrel->relid < root->simple_rel_array_size); - /* No duplicates please */ - Assert(relid_subplan_map[pathrel->relid] == 0); + /* We don't consider partitioned joins here */ + if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL) + { + RelOptInfo *prel = pathrel; + Relids partrelids = NULL; - relid_subplan_map[pathrel->relid] = i++; + /* + * Traverse up to the pathrel's topmost partitioned parent; but + * stop if we reach parentrel. (Normally, the topmost partitioned + * parent is either parentrel or a UNION ALL appendrel child of + * parentrel. But when handling partitionwise joins of + * multi-level partitioning trees, we can see an append path whose + * parentrel is an intermediate partitioned table.) + */ + do + { + AppendRelInfo *appinfo; + + Assert(prel->relid < root->simple_rel_array_size); + appinfo = root->append_rel_array[prel->relid]; + prel = find_base_rel(root, appinfo->parent_relid); + if (!IS_PARTITIONED_REL(prel)) + break; /* reached a non-partitioned parent */ + /* accept this level as an interesting parent */ + partrelids = bms_add_member(partrelids, appinfo->parent_relid); + if (prel == parentrel) + break; /* don't traverse above parentrel's level */ + } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL); + + if (!bms_is_empty(partrelids)) + { + /* + * Found some relevant parent partitions, which may or may not + * overlap with partition trees we already found. Add new + * information to the partrellists list-of-Relids. + */ + partrellists = add_part_rel_list(partrellists, partrelids); + /* Also record the subplan in relid_subplan_map[] */ + /* No duplicates please */ + Assert(relid_subplan_map[pathrel->relid] == 0); + relid_subplan_map[pathrel->relid] = i; + } + } + i++; } - /* We now build a PartitionedRelPruneInfo for each partitioned rel. */ + /* + * We now build a PartitionedRelPruneInfo for each topmost partitioned rel + * (omitting any that turn out not to have useful pruning quals). + */ prunerelinfos = NIL; - foreach(lc, partitioned_rels) + foreach(lc, partrellists) { Relids partrelids = (Relids) lfirst(lc); List *pinfolist; @@ -299,7 +335,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, pruneinfo->prune_infos = prunerelinfos; /* - * Some subplans may not belong to any of the listed partitioned rels. + * Some subplans may not belong to any of the identified partitioned rels. * This can happen for UNION ALL queries which include a non-partitioned * table, or when some of the hierarchies aren't run-time prunable. Build * a bitmapset of the indexes of all such subplans, so that the executor @@ -321,23 +357,70 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, return pruneinfo; } +/* + * add_part_rel_list + * Add new partitioning info to a list of Relids of partitioned rels. + * + * Within partrellists, there is one set of Relids for each topmost parent + * partitioned rel. Each set of Relids contains each child non-leaf partition + * it has. Parent partitioned tables always have a lower RT index than their + * child partitions and partitioned tables, so we can always identify the + * target partitioned table by looking for the lowest set bit in each Relids. + * + * Note that the result contains only RT Indexes that are parents of some + * scan-level relation appearing in the 'subpaths' that + * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are + * not allowed to be higher than the 'parentrel' associated with the append + * path. In this way, we avoid expending cycles on partitioned rels that + * can't contribute useful pruning information for the problem at hand. (Note + * that it is possible for 'parentrel' to be a child partitioned table, and it + * is also possible for scan-level relations to be child partitioned tables + * rather than leaf partitions. Hence we must construct this relation set + * with reference to the particular append path we're dealing with, rather + * than looking at the full partitioning structure represented in the + * RelOptInfos.) + */ +static List * +add_part_rel_list(List *partrellists, Relids partrels) +{ + Index targetpart = bms_next_member(partrels, -1); + ListCell *lc; + + /* Look for a matching topmost parent */ + foreach(lc, partrellists) + { + Relids currpartrelids = (Relids) lfirst(lc); + Index currtarget = bms_next_member(currpartrelids, -1); + + if (targetpart == currtarget) + { + /* Found a match, so add any new children RT indexes */ + currpartrelids = bms_add_members(currpartrelids, partrels); + lfirst(lc) = currpartrelids; + return partrellists; + } + } + /* No match, so add the new partition hierarchy to the list */ + return lappend(partrellists, partrels); +} + /* * make_partitionedrel_pruneinfo * Build a List of PartitionedRelPruneInfos, one for each partitioned * rel. These can be used in the executor to allow additional partition * pruning to take place. * - * Here we generate partition pruning steps for 'prunequal' and also build a - * data structure which allows mapping of partition indexes into 'subpaths' - * indexes. - * - * If no non-Const expressions are being compared to the partition key in any - * of the 'partitioned_rels', then we return NIL to indicate no run-time - * pruning should be performed. Run-time pruning would be useless since the - * pruning done during planning will have pruned everything that can be. - * - * On non-NIL return, 'matchedsubplans' is set to the subplan indexes which - * were matched to this partition hierarchy. + * parentrel: rel associated with the appendpath being considered + * relid_subplan_map[]: maps child relation relids to subplan indexes + * partrelids: Set of RT indexes representing the partitioned tables which we + * must run partition pruning on during execution. + * prunequal: potential pruning quals, represented for parentrel + * matchedsubplans: on success, receives the set of subplan indexes which + * were matched to this partition hierarchy. + * + * If we cannot find any useful run-time pruning steps, return NIL. + * However, on success, each element of partrellist will have an element + * in the result list, even if some of them are useless. */ static List * make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,