I've been working on implementing a way to perform plan-time partition-pruning that is hopefully faster than the current method of using constraint exclusion to prune each of the potentially many partitions one-by-one. It's not fully cooked yet though.
Meanwhile, I thought I'd share a couple of patches that implement some restructuring of the planner code related to partitioned table inheritance planning that I think would be helpful. They are to be applied on top of the patches being discussed at [1]. Note that these patches themselves don't implement the actual code that replaces constraint exclusion as a method of performing partition pruning. I will share that patch after debugging it some more. The main design goal of the patches I'm sharing here now is to defer the locking and opening of leaf partitions in a given partition tree to a point after set_append_rel_size() is called on the root partitioned table. Currently, AFAICS, we need to lock and open the child tables in expand_inherited_rtentry() only to set the translated_vars field in AppendRelInfo that we create for the child. ISTM, we can defer the creation of a child AppendRelInfo to a point when it (its field translated_vars in particular) will actually be used and so lock and open the child tables only at such a time. Although we don't lock and open the partition child tables in expand_inherited_rtentry(), their RT entries are still created and added to root->parse->rtable, so that setup_simple_rel_arrays() knows the maximum number of entries root->simple_rel_array will need to hold and allocate the memory for that array accordingly. Slots in simple_rel_array[] corresponding to partition child tables will be empty until they are created when set_append_rel_size() is called on the root parent table and it determines the partitions that will be scanned after all. Patch augments the existing PartitionedChildRelInfo node, which currently holds only the partitioned child rel RT indexes, to carry some more information about the partition tree, which includes the information returned by RelationGetPartitionDispatchInfo() when it is called from expand_inherited_rtentry() (per the proposed patch in [1], we call it to be able to add partitions to the query tree in the bound order). Actually, since PartitionedChildRelInfo now contains more information about the partition tree than it used to before, I thought the struct's name is no longer relevant, so renamed it to PartitionRootInfo and renamed root->pcinfo_list accordingly to prinfo_list. That seems okay because we only use that node internally. Then during the add_base_rels_to_query() step, when build_simple_rel() builds a RelOptInfo for the root partitioned table, it also initializes some newly introduced fields in RelOptInfo from the information contained in PartitionRootInfo of the table. The aforementioned fields are only initialized in RelOptInfos of root partitioned tables. Note that the add_base_rels_to_query() step won't add the partition "otherrel" RelOptInfos yet (unlike the regular inheritance case, where they are, after looking them up in root->append_rel_list). When set_append_rel_size() is called on the root partitioned table, it will call a find_partitions_for_query(), which using the partition tree information, determines the partitions that will need to be scanned for the query. This processing happens recursively, that is, we first determine the root-parent's partitions and then for each partition that's partitioned, we will determine its partitions and so on. As we determine partitions in this per-partitioned-table manner, we maintain a pair (parent_relid, list-of-partition-relids-to-scan) for each partitioned table and also a single list of all leaf partitions determined so far. Once all partitions have been determined, we turn to locking the leaf partitions. The locking happens in the order of OIDs as find_all_inheritors would have returned in expand_inherited_rtentry(); the list of OIDs in that original order is also stored in the table's PartitionRootInfo node. For each OID in that list, check if that OID is in the set of leaf partition OIDs that was just computed, and if so, lock it. For all chosen partitions that are partitioned tables (including the root), we create a PartitionAppendInfo node which stores the aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and append it to a list in the root table's RelOptInfo, with the root table's PartitionAppendInfo at the head of the list. Note that the list of partitions in this pair contains only the immediate partitions, so that the original parent-child relationship is reflected in the list of PartitionAppendInfos thus collected. The next patch that will implement actual partition-pruning will add some more code that will run under find_partitions_for_query(). set_append_rel_size() processing then continues for the root partitioned table. It is at this point that we will create the RelOptInfos and AppendRelInfos for partitions. First for those of the root partitioned table and then for those of each partitioned table when set_append_rel_size() will be recursively called for the latter. Note that this is still largely a WIP patch and the implementation details might change per both the feedback here and the discussion at [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf%40lab.ntt.co.jp
From 567e07fa19af575ece50f607a4374c370ae7375f Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 8 Aug 2017 18:42:30 +0900 Subject: [PATCH 1/3] Teach pg_inherits.c a bit about partitioning Both find_inheritance_children and find_all_inheritors now list partitioned child tables before non-partitioned ones and return the number of partitioned tables in an optional output argument We also now store in pg_inherits, when adding a new child, if the child is a partitioned table. Per design idea from Robert Haas --- contrib/sepgsql/dml.c | 2 +- doc/src/sgml/catalogs.sgml | 10 +++ src/backend/catalog/partition.c | 2 +- src/backend/catalog/pg_inherits.c | 157 ++++++++++++++++++++++++++------- src/backend/commands/analyze.c | 3 +- src/backend/commands/lockcmds.c | 2 +- src/backend/commands/publicationcmds.c | 2 +- src/backend/commands/tablecmds.c | 56 +++++++----- src/backend/commands/vacuum.c | 3 +- src/backend/executor/execMain.c | 3 +- src/backend/optimizer/prep/prepunion.c | 2 +- src/include/catalog/pg_inherits.h | 20 ++++- src/include/catalog/pg_inherits_fn.h | 5 +- 13 files changed, 200 insertions(+), 67 deletions(-) diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c index b643720e36..6fc279805c 100644 --- a/contrib/sepgsql/dml.c +++ b/contrib/sepgsql/dml.c @@ -333,7 +333,7 @@ sepgsql_dml_privileges(List *rangeTabls, bool abort_on_violation) if (!rte->inh) tableIds = list_make1_oid(rte->relid); else - tableIds = find_all_inheritors(rte->relid, NoLock, NULL); + tableIds = find_all_inheritors(rte->relid, NoLock, NULL, NULL); foreach(li, tableIds) { diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index ef7054cf26..00ba2906c2 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3894,6 +3894,16 @@ SCRAM-SHA-256$<replaceable><iteration count></>:<replaceable><salt>< inherited columns are to be arranged. The count starts at 1. </entry> </row> + + <row> + <entry><structfield>inhchildpartitioned</structfield></entry> + <entry><type>bool</type></entry> + <entry></entry> + <entry> + This is <literal>true</> if the child table is a partitioned table, + <literal>false</> otherwise + </entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 7618e4cb31..36f5c80b4f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -196,7 +196,7 @@ RelationBuildPartitionDesc(Relation rel) return; /* Get partition oids from pg_inherits */ - inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock); + inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, NULL); /* Collect bound spec nodes in a list */ i = 0; diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index 245a374fc9..5292ec8058 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -33,6 +33,8 @@ #include "utils/syscache.h" #include "utils/tqual.h" +static int32 inhchildinfo_cmp(const void *p1, const void *p2); + /* * Entry of a hash table used in find_all_inheritors. See below. */ @@ -42,6 +44,30 @@ typedef struct SeenRelsEntry ListCell *numparents_cell; /* corresponding list cell */ } SeenRelsEntry; +/* Information about one inheritance child table. */ +typedef struct InhChildInfo +{ + Oid relid; + bool is_partitioned; +} InhChildInfo; + +#define OID_CMP(o1, o2) \ + ((o1) < (o2) ? -1 : ((o1) > (o2) ? 1 : 0)); + +static int32 +inhchildinfo_cmp(const void *p1, const void *p2) +{ + InhChildInfo c1 = *((const InhChildInfo *) p1); + InhChildInfo c2 = *((const InhChildInfo *) p2); + + if (c1.is_partitioned && !c2.is_partitioned) + return -1; + if (!c1.is_partitioned && c2.is_partitioned) + return 1; + + return OID_CMP(c1.relid, c2.relid); +} + /* * find_inheritance_children * @@ -54,7 +80,8 @@ typedef struct SeenRelsEntry * against possible DROPs of child relations. */ List * -find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) +find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, + int *num_partitioned_children) { List *list = NIL; Relation relation; @@ -62,9 +89,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) ScanKeyData key[1]; HeapTuple inheritsTuple; Oid inhrelid; - Oid *oidarr; - int maxoids, - numoids, + InhChildInfo *inhchildren; + int maxchildren, + numchildren, + my_num_partitioned_children, i; /* @@ -77,9 +105,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) /* * Scan pg_inherits and build a working array of subclass OIDs. */ - maxoids = 32; - oidarr = (Oid *) palloc(maxoids * sizeof(Oid)); - numoids = 0; + maxchildren = 32; + inhchildren = (InhChildInfo *) palloc(maxchildren * sizeof(InhChildInfo)); + numchildren = 0; + my_num_partitioned_children = 0; relation = heap_open(InheritsRelationId, AccessShareLock); @@ -93,34 +122,45 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) while ((inheritsTuple = systable_getnext(scan)) != NULL) { - inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; - if (numoids >= maxoids) + Form_pg_inherits form = (Form_pg_inherits) GETSTRUCT(inheritsTuple); + + if (numchildren >= maxchildren) { - maxoids *= 2; - oidarr = (Oid *) repalloc(oidarr, maxoids * sizeof(Oid)); + maxchildren *= 2; + inhchildren = (InhChildInfo *) repalloc(inhchildren, + maxchildren * sizeof(InhChildInfo)); } - oidarr[numoids++] = inhrelid; + inhchildren[numchildren].relid = form->inhrelid; + inhchildren[numchildren].is_partitioned = form->inhpartitioned; + + if (form->inhpartitioned) + my_num_partitioned_children++; + numchildren++; } systable_endscan(scan); heap_close(relation, AccessShareLock); + if (num_partitioned_children) + *num_partitioned_children = my_num_partitioned_children; + /* * If we found more than one child, sort them by OID. This ensures * reasonably consistent behavior regardless of the vagaries of an * indexscan. This is important since we need to be sure all backends * lock children in the same order to avoid needless deadlocks. */ - if (numoids > 1) - qsort(oidarr, numoids, sizeof(Oid), oid_cmp); + if (numchildren > 1) + qsort(inhchildren, numchildren, sizeof(InhChildInfo), + inhchildinfo_cmp); /* * Acquire locks and build the result list. */ - for (i = 0; i < numoids; i++) + for (i = 0; i < numchildren; i++) { - inhrelid = oidarr[i]; + inhrelid = inhchildren[i].relid; if (lockmode != NoLock) { @@ -144,7 +184,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) list = lappend_oid(list, inhrelid); } - pfree(oidarr); + pfree(inhchildren); return list; } @@ -159,19 +199,30 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) * given rel. * * The specified lock type is acquired on all child relations (but not on the - * given rel; caller should already have locked it). If lockmode is NoLock - * then no locks are acquired, but caller must beware of race conditions - * against possible DROPs of child relations. + * given rel; caller should already have locked it), unless + * lock_only_partitioned_children is specified, in which case, only the + * child relations that are partitioned tables are locked. If lockmode is + * NoLock then no locks are acquired, but caller must beware of race + * conditions against possible DROPs of child relations. + * + * Returned list of OIDs is such that all the partitioned tables in the tree + * appear at the head of the list. If num_partitioned_children is non-NULL, + * *num_partitioned_children returns the number of partitioned child table + * OIDs at the head of the list. */ List * -find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) +find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, + List **numparents, int *num_partitioned_children) { /* hash table for O(1) rel_oid -> rel_numparents cell lookup */ HTAB *seen_rels; HASHCTL ctl; List *rels_list, - *rel_numparents; + *rel_numparents, + *partitioned_rels_list, + *other_rels_list; ListCell *l; + int my_num_partitioned_children; memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); @@ -185,31 +236,69 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) /* * We build a list starting with the given rel and adding all direct and - * indirect children. We can use a single list as both the record of - * already-found rels and the agenda of rels yet to be scanned for more - * children. This is a bit tricky but works because the foreach() macro - * doesn't fetch the next list element until the bottom of the loop. + * indirect children. We can use a single list (rels_list) as both the + * record of already-found rels and the agenda of rels yet to be scanned + * for more children. This is a bit tricky but works because the foreach() + * macro doesn't fetch the next list element until the bottom of the loop. + * + * partitioned_child_rels will contain the OIDs of the partitioned child + * tables and other_rels_list will contain the OIDs of the non-partitioned + * child tables. Result list will be generated by concatening the two + * lists together with partitioned_child_rels appearing first. */ rels_list = list_make1_oid(parentrelId); + partitioned_rels_list = list_make1_oid(parentrelId); + other_rels_list = NIL; rel_numparents = list_make1_int(0); + my_num_partitioned_children = 0; + foreach(l, rels_list) { Oid currentrel = lfirst_oid(l); List *currentchildren; - ListCell *lc; + ListCell *lc, + *first_nonpartitioned_child; + int cur_num_partitioned_children = 0, + i; /* Get the direct children of this rel */ - currentchildren = find_inheritance_children(currentrel, lockmode); + currentchildren = find_inheritance_children(currentrel, lockmode, + &cur_num_partitioned_children); + + my_num_partitioned_children += cur_num_partitioned_children; + + /* + * Append partitioned children to rels_list and partitioned_rels_list. + * We know for sure that partitioned children don't need the + * the de-duplication logic in the following loop, because partitioned + * tables are not allowed to partiticipate in multiple inheritance. + */ + i = 0; + foreach(lc, currentchildren) + { + if (i < cur_num_partitioned_children) + { + Oid child_oid = lfirst_oid(lc); + + rels_list = lappend_oid(rels_list, child_oid); + partitioned_rels_list = lappend_oid(partitioned_rels_list, + child_oid); + } + else + break; + i++; + } + first_nonpartitioned_child = lc; /* * Add to the queue only those children not already seen. This avoids * making duplicate entries in case of multiple inheritance paths from * the same parent. (It'll also keep us from getting into an infinite * loop, though theoretically there can't be any cycles in the - * inheritance graph anyway.) + * inheritance graph anyway.) Also, add them to the other_rels_list. */ - foreach(lc, currentchildren) + for_each_cell(lc, first_nonpartitioned_child) { Oid child_oid = lfirst_oid(lc); bool found; @@ -225,6 +314,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) { /* if it's not there, add it. expect 1 parent, initially. */ rels_list = lappend_oid(rels_list, child_oid); + other_rels_list = lappend_oid(other_rels_list, child_oid); rel_numparents = lappend_int(rel_numparents, 1); hash_entry->numparents_cell = rel_numparents->tail; } @@ -237,8 +327,13 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) list_free(rel_numparents); hash_destroy(seen_rels); + list_free(rels_list); + + if (num_partitioned_children) + *num_partitioned_children = my_num_partitioned_children; - return rels_list; + /* List partitioned child tables before non-partitioned ones. */ + return list_concat(partitioned_rels_list, other_rels_list); } diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index fbad13ea94..10cc2b8314 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1282,7 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel, * the children. */ tableOIDs = - find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL); + find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL, + NULL); /* * Check that there's at least one descendant, else fail. This could diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 9fe9e022b0..529f244f7e 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) List *children; ListCell *lc; - children = find_inheritance_children(reloid, NoLock); + children = find_inheritance_children(reloid, NoLock, NULL); foreach(lc, children) { diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 610cb499d2..64179ea3ef 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -516,7 +516,7 @@ OpenTableList(List *tables) List *children; children = find_all_inheritors(myrelid, ShareUpdateExclusiveLock, - NULL); + NULL, NULL); foreach(child, children) { diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 0f08245a67..4d686a6f71 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -299,10 +299,10 @@ static bool MergeCheckConstraint(List *constraints, char *name, Node *expr); static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel); static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel); static void StoreCatalogInheritance(Oid relationId, List *supers, - bool child_is_partition); + bool child_is_partition, bool child_is_partitioned); static void StoreCatalogInheritance1(Oid relationId, Oid parentOid, int16 seqNumber, Relation inhRelation, - bool child_is_partition); + bool child_is_partition, bool child_is_partitioned); static int findAttrByName(const char *attributeName, List *schema); static void AlterIndexNamespaces(Relation classRel, Relation rel, Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved); @@ -753,7 +753,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, typaddress); /* Store inheritance information for new rel. */ - StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL); + StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL, + relkind == RELKIND_PARTITIONED_TABLE); /* * We must bump the command counter to make the newly-created relation @@ -1238,7 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt) ListCell *child; List *children; - children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL); + children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL, + NULL); foreach(child, children) { @@ -2305,7 +2307,7 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr) */ static void StoreCatalogInheritance(Oid relationId, List *supers, - bool child_is_partition) + bool child_is_partition, bool child_is_partitioned) { Relation relation; int16 seqNumber; @@ -2336,7 +2338,7 @@ StoreCatalogInheritance(Oid relationId, List *supers, Oid parentOid = lfirst_oid(entry); StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation, - child_is_partition); + child_is_partition, child_is_partitioned); seqNumber++; } @@ -2350,7 +2352,7 @@ StoreCatalogInheritance(Oid relationId, List *supers, static void StoreCatalogInheritance1(Oid relationId, Oid parentOid, int16 seqNumber, Relation inhRelation, - bool child_is_partition) + bool child_is_partition, bool child_is_partitioned) { TupleDesc desc = RelationGetDescr(inhRelation); Datum values[Natts_pg_inherits]; @@ -2365,6 +2367,8 @@ StoreCatalogInheritance1(Oid relationId, Oid parentOid, values[Anum_pg_inherits_inhrelid - 1] = ObjectIdGetDatum(relationId); values[Anum_pg_inherits_inhparent - 1] = ObjectIdGetDatum(parentOid); values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber); + values[Anum_pg_inherits_inhpartitioned - 1] = + BoolGetDatum(child_is_partitioned); memset(nulls, 0, sizeof(nulls)); @@ -2564,7 +2568,7 @@ renameatt_internal(Oid myrelid, * outside the inheritance hierarchy being processed. */ child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, - &child_numparents); + &child_numparents, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -2591,7 +2595,7 @@ renameatt_internal(Oid myrelid, * expected_parents will only be 0 if we are not already recursing. */ if (expected_parents == 0 && - find_inheritance_children(myrelid, NoLock) != NIL) + find_inheritance_children(myrelid, NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("inherited column \"%s\" must be renamed in child tables too", @@ -2774,7 +2778,7 @@ rename_constraint_internal(Oid myrelid, *li; child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, - &child_numparents); + &child_numparents, NULL); forboth(lo, child_oids, li, child_numparents) { @@ -2790,7 +2794,7 @@ rename_constraint_internal(Oid myrelid, else { if (expected_parents == 0 && - find_inheritance_children(myrelid, NoLock) != NIL) + find_inheritance_children(myrelid, NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("inherited constraint \"%s\" must be renamed in child tables too", @@ -4803,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel, ListCell *child; List *children; - children = find_all_inheritors(relid, lockmode, NULL); + children = find_all_inheritors(relid, lockmode, NULL, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -5212,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, */ if (colDef->identity && recurse && - find_inheritance_children(myrelid, NoLock) != NIL) + find_inheritance_children(myrelid, NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("cannot recursively add identity column to table that has child tables"))); @@ -5418,7 +5422,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. */ - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); /* * If we are told not to recurse, there had better not be any child @@ -6537,7 +6542,8 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName, * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. */ - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); if (children) { @@ -6971,7 +6977,8 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. */ - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); /* * Check if ONLY was specified with ALTER TABLE. If so, allow the @@ -7692,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse, */ if (!recursing && !con->connoinherit) children = find_all_inheritors(RelationGetRelid(rel), - lockmode, NULL); + lockmode, NULL, NULL); /* * For CHECK constraints, we must ensure that we only mark the @@ -8575,7 +8582,8 @@ ATExecDropConstraint(Relation rel, const char *constrName, * use find_all_inheritors to do it in one pass. */ if (!is_no_inherit_constraint) - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); else children = NIL; @@ -8864,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue, ListCell *child; List *children; - children = find_all_inheritors(relid, lockmode, NULL); + children = find_all_inheritors(relid, lockmode, NULL, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -8915,7 +8923,8 @@ ATPrepAlterColumnType(List **wqueue, } } else if (!recursing && - find_inheritance_children(RelationGetRelid(rel), NoLock) != NIL) + find_inheritance_children(RelationGetRelid(rel), + NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("type of inherited column \"%s\" must be changed in child tables too", @@ -11027,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode) * We use weakest lock we can on child's children, namely AccessShareLock. */ children = find_all_inheritors(RelationGetRelid(child_rel), - AccessShareLock, NULL); + AccessShareLock, NULL, NULL); if (list_member_oid(children, RelationGetRelid(parent_rel))) ereport(ERROR, @@ -11136,6 +11145,8 @@ CreateInheritance(Relation child_rel, Relation parent_rel) inhseqno + 1, catalogRelation, parent_rel->rd_rel->relkind == + RELKIND_PARTITIONED_TABLE, + child_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); /* Now we're done with pg_inherits */ @@ -13696,7 +13707,8 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) * weaker lock now and the stronger one only when needed. */ attachrel_children = find_all_inheritors(RelationGetRelid(attachrel), - AccessExclusiveLock, NULL); + AccessExclusiveLock, NULL, + NULL); if (list_member_oid(attachrel_children, RelationGetRelid(rel))) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_TABLE), diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index faa181207a..e2e5ffce42 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -430,7 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel) oldcontext = MemoryContextSwitchTo(vac_context); if (include_parts) oid_list = list_concat(oid_list, - find_all_inheritors(relid, NoLock, NULL)); + find_all_inheritors(relid, NoLock, NULL, + NULL)); else oid_list = lappend_oid(oid_list, relid); MemoryContextSwitchTo(oldcontext); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index a03188aba3..4424649769 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3278,7 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel, * Get the information about the partition tree after locking all the * partitions. */ - (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL); + (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL, + NULL); RelationGetPartitionDispatchInfo(rel, &ptinfos, &leaf_parts); /* diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 68d0d8efa3..b84d6c8878 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) lockmode = AccessShareLock; /* Scan for all members of inheritance set, acquire needed locks */ - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); + inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL); /* * Check that there's at least one descendant, else treat as no-child diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h index 26bfab5db6..9f59c017e7 100644 --- a/src/include/catalog/pg_inherits.h +++ b/src/include/catalog/pg_inherits.h @@ -30,9 +30,20 @@ CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS { + /* OID of the child table. */ Oid inhrelid; + + /* OID of the parent table. */ Oid inhparent; + + /* + * Sequence number (starting with 1) of this parent, if this child table + * has multiple parents. + */ int32 inhseqno; + + /* true if the child is a partitioned table, false otherwise. */ + bool inhpartitioned; } FormData_pg_inherits; /* ---------------- @@ -46,10 +57,11 @@ typedef FormData_pg_inherits *Form_pg_inherits; * compiler constants for pg_inherits * ---------------- */ -#define Natts_pg_inherits 3 -#define Anum_pg_inherits_inhrelid 1 -#define Anum_pg_inherits_inhparent 2 -#define Anum_pg_inherits_inhseqno 3 +#define Natts_pg_inherits 4 +#define Anum_pg_inherits_inhrelid 1 +#define Anum_pg_inherits_inhparent 2 +#define Anum_pg_inherits_inhseqno 3 +#define Anum_pg_inherits_inhpartitioned 4 /* ---------------- * pg_inherits has no initial contents diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h index 7743388899..8f371acae7 100644 --- a/src/include/catalog/pg_inherits_fn.h +++ b/src/include/catalog/pg_inherits_fn.h @@ -17,9 +17,10 @@ #include "nodes/pg_list.h" #include "storage/lock.h" -extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode); +extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, + int *num_partitioned_children); extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, - List **parents); + List **parents, int *num_partitioned_children); extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId); -- 2.11.0
From ef86d03a6ed6ac0cdbdede0c1012f9006ed24de2 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Thu, 10 Aug 2017 17:59:18 +0900 Subject: [PATCH 2/3] Allow locking only partitioned children in partition tree find_inheritance_childrem will still return the OIDs of the non-partitioned children, but does not lock them if the caller asks it so. None of the callers pass 'true' yet though. --- contrib/sepgsql/dml.c | 3 ++- src/backend/catalog/partition.c | 3 ++- src/backend/catalog/pg_inherits.c | 20 ++++++++++++++++---- src/backend/commands/analyze.c | 4 ++-- src/backend/commands/lockcmds.c | 2 +- src/backend/commands/publicationcmds.c | 2 +- src/backend/commands/tablecmds.c | 34 +++++++++++++++++----------------- src/backend/commands/vacuum.c | 4 ++-- src/backend/executor/execMain.c | 4 ++-- src/backend/optimizer/prep/prepunion.c | 2 +- src/include/catalog/pg_inherits_fn.h | 2 ++ 11 files changed, 48 insertions(+), 32 deletions(-) diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c index 6fc279805c..91f338f8bf 100644 --- a/contrib/sepgsql/dml.c +++ b/contrib/sepgsql/dml.c @@ -333,7 +333,8 @@ sepgsql_dml_privileges(List *rangeTabls, bool abort_on_violation) if (!rte->inh) tableIds = list_make1_oid(rte->relid); else - tableIds = find_all_inheritors(rte->relid, NoLock, NULL, NULL); + tableIds = find_all_inheritors(rte->relid, NoLock, false, + NULL, NULL); foreach(li, tableIds) { diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 36f5c80b4f..c972760fe4 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -196,7 +196,8 @@ RelationBuildPartitionDesc(Relation rel) return; /* Get partition oids from pg_inherits */ - inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, NULL); + inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, false, + NULL); /* Collect bound spec nodes in a list */ i = 0; diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index 5292ec8058..72420f65f1 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -74,13 +74,16 @@ inhchildinfo_cmp(const void *p1, const void *p2) * Returns a list containing the OIDs of all relations which * inherit *directly* from the relation with OID 'parentrelId'. * - * The specified lock type is acquired on each child relation (but not on the - * given rel; caller should already have locked it). If lockmode is NoLock - * then no locks are acquired, but caller must beware of race conditions - * against possible DROPs of child relations. + * The specified lock type is acquired on each child relation, (but not on the + * given rel; caller should already have locked it), unless + * lock_only_partitioned_children is specified in which case only partitioned + * children are locked. If lockmode is NoLock then no locks are acquired, but + * caller must beware of race conditions against possible DROPs of child + * relations. */ List * find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, + bool lock_only_partitioned_children, int *num_partitioned_children) { List *list = NIL; @@ -162,6 +165,13 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, { inhrelid = inhchildren[i].relid; + /* If requested, skip locking non-partitioned children. */ + if (lock_only_partitioned_children && i >= *num_partitioned_children) + { + list = lappend_oid(list, inhrelid); + continue; + } + if (lockmode != NoLock) { /* Get the lock to synchronize against concurrent drop */ @@ -212,6 +222,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, */ List * find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, + bool lock_only_partitioned_children, List **numparents, int *num_partitioned_children) { /* hash table for O(1) rel_oid -> rel_numparents cell lookup */ @@ -264,6 +275,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, /* Get the direct children of this rel */ currentchildren = find_inheritance_children(currentrel, lockmode, + lock_only_partitioned_children, &cur_num_partitioned_children); my_num_partitioned_children += cur_num_partitioned_children; diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 10cc2b8314..4bd374632f 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1282,8 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel, * the children. */ tableOIDs = - find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL, - NULL); + find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, false, + NULL, NULL); /* * Check that there's at least one descendant, else fail. This could diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 529f244f7e..771aa11b1c 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) List *children; ListCell *lc; - children = find_inheritance_children(reloid, NoLock, NULL); + children = find_inheritance_children(reloid, NoLock, false, NULL); foreach(lc, children) { diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 64179ea3ef..4315028c66 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -516,7 +516,7 @@ OpenTableList(List *tables) List *children; children = find_all_inheritors(myrelid, ShareUpdateExclusiveLock, - NULL, NULL); + false, NULL, NULL); foreach(child, children) { diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 4d686a6f71..ef3869854a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1239,8 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt) ListCell *child; List *children; - children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL, - NULL); + children = find_all_inheritors(myrelid, AccessExclusiveLock, false, + NULL, NULL); foreach(child, children) { @@ -2567,7 +2567,7 @@ renameatt_internal(Oid myrelid, * calls to renameatt() can determine whether there are any parents * outside the inheritance hierarchy being processed. */ - child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, + child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, false, &child_numparents, NULL); /* @@ -2595,7 +2595,7 @@ renameatt_internal(Oid myrelid, * expected_parents will only be 0 if we are not already recursing. */ if (expected_parents == 0 && - find_inheritance_children(myrelid, NoLock, NULL) != NIL) + find_inheritance_children(myrelid, NoLock, false, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("inherited column \"%s\" must be renamed in child tables too", @@ -2778,7 +2778,7 @@ rename_constraint_internal(Oid myrelid, *li; child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, - &child_numparents, NULL); + false, &child_numparents, NULL); forboth(lo, child_oids, li, child_numparents) { @@ -2794,7 +2794,7 @@ rename_constraint_internal(Oid myrelid, else { if (expected_parents == 0 && - find_inheritance_children(myrelid, NoLock, NULL) != NIL) + find_inheritance_children(myrelid, NoLock, false, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("inherited constraint \"%s\" must be renamed in child tables too", @@ -4807,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel, ListCell *child; List *children; - children = find_all_inheritors(relid, lockmode, NULL, NULL); + children = find_all_inheritors(relid, lockmode, false, NULL, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -5216,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, */ if (colDef->identity && recurse && - find_inheritance_children(myrelid, NoLock, NULL) != NIL) + find_inheritance_children(myrelid, NoLock, false, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("cannot recursively add identity column to table that has child tables"))); @@ -5423,7 +5423,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, * use find_all_inheritors to do it in one pass. */ children = find_inheritance_children(RelationGetRelid(rel), lockmode, - NULL); + false, NULL); /* * If we are told not to recurse, there had better not be any child @@ -6543,7 +6543,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName, * use find_all_inheritors to do it in one pass. */ children = find_inheritance_children(RelationGetRelid(rel), lockmode, - NULL); + false, NULL); if (children) { @@ -6978,7 +6978,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, * use find_all_inheritors to do it in one pass. */ children = find_inheritance_children(RelationGetRelid(rel), lockmode, - NULL); + false, NULL); /* * Check if ONLY was specified with ALTER TABLE. If so, allow the @@ -7699,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse, */ if (!recursing && !con->connoinherit) children = find_all_inheritors(RelationGetRelid(rel), - lockmode, NULL, NULL); + lockmode, false, NULL, NULL); /* * For CHECK constraints, we must ensure that we only mark the @@ -8583,7 +8583,7 @@ ATExecDropConstraint(Relation rel, const char *constrName, */ if (!is_no_inherit_constraint) children = find_inheritance_children(RelationGetRelid(rel), lockmode, - NULL); + false, NULL); else children = NIL; @@ -8872,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue, ListCell *child; List *children; - children = find_all_inheritors(relid, lockmode, NULL, NULL); + children = find_all_inheritors(relid, lockmode, false, NULL, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -8924,7 +8924,7 @@ ATPrepAlterColumnType(List **wqueue, } else if (!recursing && find_inheritance_children(RelationGetRelid(rel), - NoLock, NULL) != NIL) + NoLock, false, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("type of inherited column \"%s\" must be changed in child tables too", @@ -11036,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode) * We use weakest lock we can on child's children, namely AccessShareLock. */ children = find_all_inheritors(RelationGetRelid(child_rel), - AccessShareLock, NULL, NULL); + AccessShareLock, false, NULL, NULL); if (list_member_oid(children, RelationGetRelid(parent_rel))) ereport(ERROR, @@ -13707,7 +13707,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) * weaker lock now and the stronger one only when needed. */ attachrel_children = find_all_inheritors(RelationGetRelid(attachrel), - AccessExclusiveLock, NULL, + AccessExclusiveLock, false, NULL, NULL); if (list_member_oid(attachrel_children, RelationGetRelid(rel))) ereport(ERROR, diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index e2e5ffce42..70cd5721f3 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -430,8 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel) oldcontext = MemoryContextSwitchTo(vac_context); if (include_parts) oid_list = list_concat(oid_list, - find_all_inheritors(relid, NoLock, NULL, - NULL)); + find_all_inheritors(relid, NoLock, false, + NULL, NULL)); else oid_list = lappend_oid(oid_list, relid); MemoryContextSwitchTo(oldcontext); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 4424649769..63529ab1dd 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3278,8 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel, * Get the information about the partition tree after locking all the * partitions. */ - (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL, - NULL); + (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, false, + NULL, NULL); RelationGetPartitionDispatchInfo(rel, &ptinfos, &leaf_parts); /* diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index b84d6c8878..ee2e066263 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) lockmode = AccessShareLock; /* Scan for all members of inheritance set, acquire needed locks */ - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL); + inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL); /* * Check that there's at least one descendant, else treat as no-child diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h index 8f371acae7..e568d11e43 100644 --- a/src/include/catalog/pg_inherits_fn.h +++ b/src/include/catalog/pg_inherits_fn.h @@ -18,8 +18,10 @@ #include "storage/lock.h" extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, + bool lock_only_partitioned_children, int *num_partitioned_children); extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, + bool lock_only_partitioned_children, List **parents, int *num_partitioned_children); extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); -- 2.11.0
From 49582f6707611a572b441bf692fd925e9d658781 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 26 Jul 2017 14:42:47 +0900 Subject: [PATCH 3/3] WIP: Defer opening and locking partitions to set_append_rel_size --- src/backend/catalog/partition.c | 20 ++ src/backend/nodes/copyfuncs.c | 17 -- src/backend/nodes/equalfuncs.c | 12 -- src/backend/nodes/outfuncs.c | 57 +++++- src/backend/optimizer/path/allpaths.c | 357 +++++++++++++++++++++++++++++++-- src/backend/optimizer/plan/planner.c | 106 ++++++++-- src/backend/optimizer/prep/prepunion.c | 266 +++++++++++++++--------- src/backend/optimizer/util/plancat.c | 44 ++++ src/backend/optimizer/util/relnode.c | 81 +++++++- src/backend/utils/cache/lsyscache.c | 50 +++++ src/include/catalog/partition.h | 4 + src/include/nodes/nodes.h | 5 +- src/include/nodes/relation.h | 93 +++++++-- src/include/optimizer/plancat.h | 1 + src/include/optimizer/prep.h | 3 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/insert.out | 4 +- 17 files changed, 938 insertions(+), 184 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index c972760fe4..41127a584e 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1161,6 +1161,26 @@ RelationGetPartitionDispatchInfo(Relation rel, Assert((offset + 1) == list_length(*ptinfos)); } +/* + * get_partitions_for_keys + * Returns the list of indexes (from pd->indexes) of the partitions that + * will need to be scanned for the given scan keys. + * + * TODO: add the interface to pass the query scan keys and the logic to look + * up partitions using those keys. + */ +List * +get_partitions_for_keys(PartitionDispatch pd) +{ + int i; + List *result = NIL; + + for (i = 0; i < pd->partdesc->nparts; i++) + result = lappend_int(result, pd->indexes[i]); + + return result; +} + /* Module-local functions */ /* diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 72041693df..8d17d7f52c 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2249,20 +2249,6 @@ _copyAppendRelInfo(const AppendRelInfo *from) } /* - * _copyPartitionedChildRelInfo - */ -static PartitionedChildRelInfo * -_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from) -{ - PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo); - - COPY_SCALAR_FIELD(parent_relid); - COPY_NODE_FIELD(child_rels); - - return newnode; -} - -/* * _copyPlaceHolderInfo */ static PlaceHolderInfo * @@ -4994,9 +4980,6 @@ copyObjectImpl(const void *from) case T_AppendRelInfo: retval = _copyAppendRelInfo(from); break; - case T_PartitionedChildRelInfo: - retval = _copyPartitionedChildRelInfo(from); - break; case T_PlaceHolderInfo: retval = _copyPlaceHolderInfo(from); break; diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 8d92c03633..fb248f31f3 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -905,15 +905,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b) } static bool -_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b) -{ - COMPARE_SCALAR_FIELD(parent_relid); - COMPARE_NODE_FIELD(child_rels); - - return true; -} - -static bool _equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b) { COMPARE_SCALAR_FIELD(phid); @@ -3155,9 +3146,6 @@ equal(const void *a, const void *b) case T_AppendRelInfo: retval = _equalAppendRelInfo(a, b); break; - case T_PartitionedChildRelInfo: - retval = _equalPartitionedChildRelInfo(a, b); - break; case T_PlaceHolderInfo: retval = _equalPlaceHolderInfo(a, b); break; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 5ce3c7c599..1c7caca013 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2211,7 +2211,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node) WRITE_NODE_FIELD(full_join_clauses); WRITE_NODE_FIELD(join_info_list); WRITE_NODE_FIELD(append_rel_list); - WRITE_NODE_FIELD(pcinfo_list); + WRITE_NODE_FIELD(prinfo_list); WRITE_NODE_FIELD(rowMarks); WRITE_NODE_FIELD(placeholder_list); WRITE_NODE_FIELD(fkey_list); @@ -2285,6 +2285,12 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node) WRITE_NODE_FIELD(joininfo); WRITE_BOOL_FIELD(has_eclass_joins); WRITE_BITMAPSET_FIELD(top_parent_relids); + WRITE_INT_FIELD(num_parted); + /* don't bother printing partition_infos */ + WRITE_INT_FIELD(num_leaf_parts); + /* don't bother printing leaf_part_infos */ + WRITE_NODE_FIELD(live_partition_painfos); + WRITE_UINT_FIELD(root_parent_relid); } static void @@ -2510,12 +2516,42 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node) } static void -_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node) +_outPartitionInfo(StringInfo str, const PartitionInfo *node) { - WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO"); + WRITE_NODE_TYPE("PARTITIONINFO"); + + WRITE_UINT_FIELD(relid); + /* Don't bother writing out the PartitionDispatch object */ +} + +static void +_outLeafPartitionInfo(StringInfo str, const LeafPartitionInfo *node) +{ + WRITE_NODE_TYPE("LEAFPARTITIONINFO"); + + WRITE_OID_FIELD(reloid); + WRITE_UINT_FIELD(relid); +} + +static void +_outPartitionAppendInfo(StringInfo str, const PartitionAppendInfo *node) +{ + WRITE_NODE_TYPE("PARTITIONAPPENDINFO"); + + WRITE_UINT_FIELD(parent_relid); + WRITE_NODE_FIELD(live_partition_relids); +} + +static void +_outPartitionRootInfo(StringInfo str, const PartitionRootInfo *node) +{ + WRITE_NODE_TYPE("PARTITIONROOTINFO"); WRITE_UINT_FIELD(parent_relid); - WRITE_NODE_FIELD(child_rels); + WRITE_NODE_FIELD(partition_infos); + WRITE_NODE_FIELD(partitioned_relids); + WRITE_NODE_FIELD(leaf_part_infos); + WRITE_NODE_FIELD(orig_leaf_part_oids); } static void @@ -4043,8 +4079,17 @@ outNode(StringInfo str, const void *obj) case T_AppendRelInfo: _outAppendRelInfo(str, obj); break; - case T_PartitionedChildRelInfo: - _outPartitionedChildRelInfo(str, obj); + case T_PartitionInfo: + _outPartitionInfo(str, obj); + break; + case T_LeafPartitionInfo: + _outLeafPartitionInfo(str, obj); + break; + case T_PartitionAppendInfo: + _outPartitionAppendInfo(str, obj); + break; + case T_PartitionRootInfo: + _outPartitionRootInfo(str, obj); break; case T_PlaceHolderInfo: _outPlaceHolderInfo(str, obj); diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 2d7e1d84d0..c9c0b85cd9 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -20,6 +20,7 @@ #include "access/sysattr.h" #include "access/tsmapi.h" +#include "catalog/partition.h" #include "catalog/pg_class.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -43,6 +44,8 @@ #include "parser/parse_clause.h" #include "parser/parsetree.h" #include "rewrite/rewriteManip.h" +#include "storage/lmgr.h" +#include "utils/builtins.h" #include "utils/lsyscache.h" @@ -334,7 +337,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel, */ set_dummy_rel_pathlist(rel); } - else if (rte->inh) + else if (rte->inh || rte->relkind == RELKIND_PARTITIONED_TABLE) { /* It's an "append relation", process accordingly */ set_append_rel_size(root, rel, rti, rte); @@ -425,7 +428,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, { /* We already proved the relation empty, so nothing more to do */ } - else if (rte->inh) + else if (rte->inh || rte->relkind == RELKIND_PARTITIONED_TABLE) { /* It's an "append relation", process accordingly */ set_append_rel_pathlist(root, rel, rti, rte); @@ -845,6 +848,166 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) } /* + * get_partitions_recurse + * Find partitions of the partitioned table described in partinfo, + * recursing for those partitions that are themselves partitioned tables + * + * rootrel is the root of the partition tree of which this table is a part. + * We create a PartitionAppendInfo for this partitioned table and append it to + * rootrel->live_partition_painfos. + * + * List of the leaf partitions of this table will be returned. + */ +static List * +get_rel_partitions_recurse(RelOptInfo *rootrel, + PartitionInfo *partinfo, + PartitionInfo **all_partinfos, + LeafPartitionInfo **leaf_part_infos) +{ + PartitionAppendInfo *painfo; + List *indexes; + List *result = NIL, + *my_live_partitions = NIL; + ListCell *l; + + /* + * Create a PartitionAppendInfo to map this table to the child tables + * that will be its Append children. + */ + painfo = makeNode(PartitionAppendInfo); + painfo->parent_relid = partinfo->relid; + + /* They will all be under the root table's Append node. */ + rootrel->live_partition_painfos = lappend(rootrel->live_partition_painfos, + painfo); + + /* + * TODO: collect the keys by looking at the clauses in + * rootrel->baserestrictinfo considering this table's partition keys. + */ + + /* Ask partition.c which partitions it thinks match the keys. */ + indexes = get_partitions_for_keys(partinfo->pd); + + /* Collect leaf partitions in the result list and recurse for others. */ + foreach(l, indexes) + { + int index = lfirst_int(l); + + if (index >= 0) + { + LeafPartitionInfo *lpinfo = leaf_part_infos[index]; + + result = lappend_oid(result, lpinfo->reloid); + my_live_partitions = lappend_int(my_live_partitions, + lpinfo->relid); + } + else + { + PartitionInfo *recurse_partinfo = all_partinfos[-index]; + List *my_leaf_partitions; + + my_live_partitions = lappend_int(my_live_partitions, + recurse_partinfo->relid); + my_leaf_partitions = get_rel_partitions_recurse(rootrel, + recurse_partinfo, + all_partinfos, + leaf_part_infos); + result = list_concat(result, my_leaf_partitions); + } + } + + painfo->live_partition_relids = my_live_partitions; + + return result; +} + +/* + * get_rel_partitions + * Recursively find partitions of rel + */ +static List * +get_rel_partitions(RelOptInfo *rel) +{ + return get_rel_partitions_recurse(rel, + rel->partition_infos[0], + rel->partition_infos, + rel->leaf_part_infos); +} + +/* + * find_rel_partitions + * Find and lock partitions of rel relevant to this query + * + * Note that we only ever need to lock the leaf partitions, because the + * partitioned tables in the partition tree have already been locked. + */ +static void +find_partitions_for_query(PlannerInfo *root, RelOptInfo *rel) +{ + List *leaf_part_oids = NIL; + ListCell *l; + PlanRowMark *rc = NULL; + int lockmode; + int num_leaf_parts, + i; + Oid *leaf_part_oids_array; + PartitionRootInfo *prinfo = NULL; + + /* Find partitions. */ + Assert(rel->partition_infos != NULL); + leaf_part_oids = get_rel_partitions(rel); + + /* Convert the list to an array and sort for binary searching later. */ + num_leaf_parts = list_length(leaf_part_oids); + leaf_part_oids_array = (Oid *) palloc(num_leaf_parts * sizeof(Oid)); + i = 0; + foreach(l, leaf_part_oids) + { + leaf_part_oids_array[i++] = lfirst_oid(l); + } + qsort(leaf_part_oids_array, num_leaf_parts, sizeof(Oid), oid_cmp); + + /* + * Now lock partitions. Note that rel cannot be a result relation or we + * wouldn't be here (inheritance_planner is where result relations go). + */ + rc = get_plan_rowmark(root->rowMarks, rel->relid); + if (rc && RowMarkRequiresRowShareLock(rc->markType)) + lockmode = RowShareLock; + else + lockmode = AccessShareLock; + + /* + * We lock leaf partitions in the order in which find_all_inheritors + * found them in expand_inherited_rtentry(). Find that list by locating + * the PartitionRootInfo for this table. + */ + foreach(l, root->prinfo_list) + { + prinfo = lfirst(l); + + if (rel->relid == prinfo->parent_relid) + break; + } + Assert(prinfo != NULL && rel->relid == prinfo->parent_relid); + foreach(l, prinfo->orig_leaf_part_oids) + { + Oid relid = lfirst_oid(l); + Oid *test; + + /* Will this leaf partition be scanned? */ + test = (Oid *) bsearch(&relid, + leaf_part_oids_array, + num_leaf_parts, + sizeof(Oid), oid_cmp); + /* Yep, so lock. */ + if (test != NULL) + LockRelationOid(relid, lockmode); + } +} + +/* * set_append_rel_size * Set size estimates for a simple "append relation" * @@ -866,6 +1029,134 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, double *parent_attrsizes; int nattrs; ListCell *l; + List *rel_appinfos = NIL; + + /* + * Collect a list child AppendRelInfo's, which in the non-partitioned + * case will be found in root->append_rel_list. In the partitioned + * table's case, we didn't build any AppendRelInfo's yet. We will + * do the same after figuring out which of the table's child tables + * (aka partitions) will need to be scanned for this query. + */ + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach(l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + { + List *live_partitions; + Relation parent; + List *parent_vars; + RelOptInfo *rootrel; + + /* + * If this is a partitioned table root, we will determine all the + * partitions in this partition tree that we need to scan for this + * query. Among those, partitions that have not yet been locked (viz. + * the leaf partitions), will be. + */ + if (rel->partition_infos != NULL) + { + PartitionAppendInfo *painfo; + + rootrel = rel; + find_partitions_for_query(root, rel); + painfo = linitial(rel->live_partition_painfos); + Assert(rti == painfo->parent_relid); + live_partitions = painfo->live_partition_relids; + } + else + { + /* + * Just need to get hold of the PartitionAppendInfo via the root + * parent's RelOptInfo. + */ + rootrel = root->simple_rel_array[rel->root_parent_relid]; + foreach(l, rootrel->live_partition_painfos) + { + PartitionAppendInfo *painfo = lfirst(l); + + if (rti == painfo->parent_relid) + { + live_partitions = painfo->live_partition_relids; + break; + } + } + } + + /* + * Create an AppendRelInfo and a RelOptInfo for every candidate + * partition. + */ + parent = heap_open(rte->relid, NoLock); + parent_vars = build_rel_vars(rte, rti); + foreach(l, live_partitions) + { + Index childRTindex = lfirst_int(l); + RangeTblEntry *childrte = planner_rt_fetch(childRTindex, root); + Relation child; + AppendRelInfo *appinfo; + RelOptInfo *childrel; + + child = heap_open(childrte->relid, NoLock); /* already locked! */ + appinfo = makeNode(AppendRelInfo); + appinfo->parent_relid = rti; + appinfo->child_relid = childRTindex; + appinfo->parent_reltype = parent->rd_rel->reltype; + appinfo->child_reltype = child->rd_rel->reltype; + appinfo->translated_vars = map_partition_varattnos(parent_vars, + rti, + child, parent, + NULL); + ChangeVarNodes((Node *) appinfo->translated_vars, + rti, childRTindex, 0); + appinfo->parent_reloid = rte->relid; + rel_appinfos = lappend(rel_appinfos, appinfo); + root->append_rel_list = lappend(root->append_rel_list, appinfo); + + /* + * Translate the column permissions bitmaps to the child's attnums + * (we have to build the translated_vars list before we can do + * this). But if this is the parent table, leave copyObject's + * result alone. + * + * Note: we need to do this even though the executor won't run any + * permissions checks on the child RTE. The + * insertedCols/updatedCols bitmaps may be examined for + * trigger-firing purposes. + */ + childrte->selectedCols = translate_col_privs(rte->selectedCols, + appinfo->translated_vars); + childrte->insertedCols = translate_col_privs(rte->insertedCols, + appinfo->translated_vars); + childrte->updatedCols = translate_col_privs(rte->updatedCols, + appinfo->translated_vars); + + childrel = build_simple_rel(root, childRTindex, rel); + childrel->root_parent_relid = rootrel->relid; + Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL); + + /* Copy the data that create_lateral_join_info() created */ + Assert(childrel->direct_lateral_relids == NULL); + childrel->direct_lateral_relids = rel->direct_lateral_relids; + Assert(childrel->lateral_relids == NULL); + childrel->lateral_relids = rel->lateral_relids; + Assert(childrel->lateral_referencers == NULL); + childrel->lateral_referencers = rel->lateral_referencers; + + root->total_table_pages += childrel->pages; + + heap_close(child, NoLock); + } + heap_close(parent, NoLock); + } Assert(IS_SIMPLE_REL(rel)); @@ -889,7 +1180,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, nattrs = rel->max_attr - rel->min_attr + 1; parent_attrsizes = (double *) palloc0(nattrs * sizeof(double)); - foreach(l, root->append_rel_list) + foreach(l, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); int childRTindex; @@ -902,10 +1193,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, ListCell *childvars; ListCell *lc; - /* append_rel_list contains all append rels; ignore others */ - if (appinfo->parent_relid != parentRTindex) - continue; - childRTindex = appinfo->child_relid; childRTE = root->simple_rte_array[childRTindex]; @@ -1211,24 +1498,61 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, int parentRTindex = rti; List *live_childrels = NIL; ListCell *l; + List *append_rel_children = NIL; + + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach(l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + append_rel_children = lappend_int(append_rel_children, + appinfo->child_relid); + } + } + else + { + /* For a partitioned table, first find its PartitionAppendInfo */ + if (rel->live_partition_painfos != NIL) + { + PartitionAppendInfo *painfo; + + /* This is the root partitioned rel. */ + painfo = linitial(rel->live_partition_painfos); + append_rel_children = painfo->live_partition_relids; + } + else + { + RelOptInfo *rootrel; + + /* Non-root partitioned table. Get it from the root rel. */ + rootrel = root->simple_rel_array[rel->root_parent_relid]; + foreach(l, rootrel->live_partition_painfos) + { + PartitionAppendInfo *painfo = lfirst(l); + + if (rti == painfo->parent_relid) + { + append_rel_children = painfo->live_partition_relids; + break; + } + } + } + } /* * Generate access paths for each member relation, and remember the * non-dummy children. */ - foreach(l, root->append_rel_list) + foreach(l, append_rel_children) { - AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); - int childRTindex; + int childRTindex = lfirst_int(l); RangeTblEntry *childRTE; RelOptInfo *childrel; - /* append_rel_list contains all append rels; ignore others */ - if (appinfo->parent_relid != parentRTindex) - continue; - /* Re-locate the child RTE and RelOptInfo */ - childRTindex = appinfo->child_relid; childRTE = root->simple_rte_array[childRTindex]; childrel = root->simple_rel_array[childRTindex]; @@ -1289,7 +1613,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte; rte = planner_rt_fetch(rel->relid, root); - if (rte->relkind == RELKIND_PARTITIONED_TABLE) + /* Note that only a root partitioned table would have inh flag set. */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE && rte->inh) { partitioned_rels = get_partitioned_child_rels(root, rel->relid); /* The root partitioned table is included as a child rel */ diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index fdef00ab39..09dd32de79 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -514,7 +514,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse, root->multiexpr_params = NIL; root->eq_classes = NIL; root->append_rel_list = NIL; - root->pcinfo_list = NIL; + root->prinfo_list = NIL; root->rowMarks = NIL; memset(root->upper_rels, 0, sizeof(root->upper_rels)); memset(root->upper_targets, 0, sizeof(root->upper_targets)); @@ -1050,6 +1050,93 @@ inheritance_planner(PlannerInfo *root) Index rti; RangeTblEntry *parent_rte; List *partitioned_rels = NIL; + List *rel_appinfos = NIL; + ListCell *l; + + parent_rte = rt_fetch(parentRTindex, root->parse->rtable); + if (parent_rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach(l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + { + PartitionRootInfo *prinfo = NULL; + Relation parent; + List *parent_vars = build_rel_vars(parent_rte, parentRTindex); + + /* Find the PartitionedChildRelInfo for this rel */ + foreach(l, root->prinfo_list) + { + prinfo = lfirst(l); + + if (prinfo->parent_relid == parentRTindex) + break; + } + Assert(prinfo != NULL && prinfo->parent_relid == parentRTindex); + + parent = heap_open(parent_rte->relid, NoLock); + foreach(l, prinfo->leaf_part_infos) + { + LeafPartitionInfo *lpinfo = lfirst(l); + Index childRTindex = lpinfo->relid; + RangeTblEntry *childrte = planner_rt_fetch(childRTindex, root); + Relation child; + AppendRelInfo *appinfo; + + if (childrte->relkind == RELKIND_PARTITIONED_TABLE) + continue; + + /* + * We'll need RowExclusiveLock, because just like the parent, each + * child is a result relation. + */ + child = heap_open(childrte->relid, RowExclusiveLock); + appinfo = makeNode(AppendRelInfo); + appinfo->parent_relid = parentRTindex; + appinfo->child_relid = childRTindex; + appinfo->parent_reltype = parent->rd_rel->reltype; + appinfo->child_reltype = child->rd_rel->reltype; + appinfo->translated_vars = map_partition_varattnos(parent_vars, + parentRTindex, + child, parent, + NULL); + ChangeVarNodes((Node *) appinfo->translated_vars, + parentRTindex, childRTindex, 0); + appinfo->parent_reloid = RelationGetRelid(parent); + rel_appinfos = lappend(rel_appinfos, appinfo); + root->append_rel_list = lappend(root->append_rel_list, appinfo); + + /* + * Translate the column permissions bitmaps to the child's attnums + * (we have to build the translated_vars list before we can do + * this). But if this is the parent table, leave copyObject's + * result alone. + * + * Note: we need to do this even though the executor won't run any + * permissions checks on the child RTE. The + * insertedCols/updatedCols bitmaps may be examined for + * trigger-firing purposes. + */ + childrte->selectedCols = + translate_col_privs(parent_rte->selectedCols, + appinfo->translated_vars); + childrte->insertedCols = + translate_col_privs(parent_rte->insertedCols, + appinfo->translated_vars); + childrte->updatedCols = + translate_col_privs(parent_rte->updatedCols, + appinfo->translated_vars); + heap_close(child, NoLock); + } + heap_close(parent, NoLock); + } Assert(parse->commandType != CMD_INSERT); @@ -1115,14 +1202,13 @@ inheritance_planner(PlannerInfo *root) * opposite in the case of non-partitioned inheritance parent as described * below. */ - parent_rte = rt_fetch(parentRTindex, root->parse->rtable); if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE) nominalRelation = parentRTindex; /* * And now we can get on with generating a plan for each child table. */ - foreach(lc, root->append_rel_list) + foreach(lc, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); PlannerInfo *subroot; @@ -1130,10 +1216,6 @@ inheritance_planner(PlannerInfo *root) RelOptInfo *sub_final_rel; Path *subpath; - /* append_rel_list contains all append rels; ignore others */ - if (appinfo->parent_relid != parentRTindex) - continue; - /* * We need a working copy of the PlannerInfo so that we can control * propagation of information back to the main copy. @@ -6070,7 +6152,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid) * Returns a list of the RT indexes of the partitioned child relations * with rti as the root parent RT index. * - * Note: Only call this function on RTEs known to be partitioned tables. + * Note: Only call this function on RTEs known to be a root partitioned table. */ List * get_partitioned_child_rels(PlannerInfo *root, Index rti) @@ -6078,13 +6160,13 @@ get_partitioned_child_rels(PlannerInfo *root, Index rti) List *result = NIL; ListCell *l; - foreach(l, root->pcinfo_list) + foreach(l, root->prinfo_list) { - PartitionedChildRelInfo *pc = lfirst(l); + PartitionRootInfo *prinfo = lfirst(l); - if (pc->parent_relid == rti) + if (prinfo->parent_relid == rti) { - result = pc->child_rels; + result = prinfo->partitioned_relids; break; } } diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index ee2e066263..4b4d95eb63 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -105,8 +105,6 @@ static void make_inh_translation_list(Relation oldrelation, Relation newrelation, Index newvarno, List **translated_vars); -static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, - List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, adjust_appendrel_attrs_context *context); static Relids adjust_child_relids(Relids relids, int nappinfos, @@ -1352,11 +1350,19 @@ expand_inherited_tables(PlannerInfo *root) /* * expand_inherited_rtentry - * Check whether a rangetable entry represents an inheritance set. - * If so, add entries for all the child tables to the query's - * rangetable, and build AppendRelInfo nodes for all the child tables - * and add them to root->append_rel_list. If not, clear the entry's - * "inh" flag to prevent later code from looking for AppendRelInfos. + * Perform actions necessary for applying this query to an inheritance + * set if the rte represents one + * + * That includes adding entries for all the child tables to the query's + * rangetable. Also, if this query requires a PlanRowMark, generate the same + * for each child table and append them to the planner's global list + * (root->rowMarks). If the inheritance set is really a partitioned table, + * our work here is done. If not, we also create AppendRelInfo nodes for + * all the child tables and add them to root->append_rel_list. + * + * If it turns out that the rte is not (or no longer) an inheritance set, + * clear the entry's "inh" flag to prevent later code from looking for + * AppendRelInfos. * * Note that the original RTE is considered to represent the whole * inheritance set. The first of the generated RTEs is an RTE for the same @@ -1381,9 +1387,13 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) List *inhOIDs; List *appinfos; ListCell *l; - bool has_child; - PartitionedChildRelInfo *pcinfo; List *partitioned_child_rels = NIL; + List *partition_infos = NIL; + List *leaf_part_infos = NIL; + List *orig_leaf_part_oids; + int num_partitioned_children; + PartitionedTableInfo *ptinfo; + PartitionInfo *pinfo; /* Does RT entry allow inheritance? */ if (!rte->inh) @@ -1408,6 +1418,11 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) * relation named in the query. However, for each child relation we add * to the query, we must obtain an appropriate lock, because this will be * the first use of those relations in the parse/rewrite/plan pipeline. + * For a partitioned table, we defer locking non-partitioned child tables + * to when we actually know that it will be scanned (see below that we + * use RelationGetPartitionDispatchInfo() to get the list of child tables + * of partitioned tables, not find_all_inheritors() which would lock the + * child tables.) * * If the parent relation is the query's result relation, then we need * RowExclusiveLock. Otherwise, if it's accessed FOR UPDATE/SHARE, we @@ -1425,7 +1440,8 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) lockmode = AccessShareLock; /* Scan for all members of inheritance set, acquire needed locks */ - inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL); + inhOIDs = find_all_inheritors(parentOID, lockmode, true, NULL, + &num_partitioned_children); /* * Check that there's at least one descendant, else treat as no-child @@ -1461,9 +1477,17 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) { List *leaf_part_oids, *ptinfos; + int rtable_length = list_length(parse->rtable), + i; + + /* + * Keep leaf partition OIDs around so that we can lock them in this + * order when we eventually do it. + */ + orig_leaf_part_oids = list_copy_tail(inhOIDs, + num_partitioned_children + 1); - /* Discard the original list. */ - list_free(inhOIDs); + /* Discard the original inhOIDs list. */ inhOIDs = NIL; /* Request partitioning information. */ @@ -1471,14 +1495,37 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) &leaf_part_oids); /* - * First collect the partitioned child table OIDs, which includes the - * root parent at the head. + * We make a PartitionInfo object for every partitioned table in the + * tree, including the root table. We create the root table's + * PartitionInfo outside the loop, because we'd like to use its + * original RT index, whereas for the child partitioned tables, we'll + * use their to-be RT indexes. */ + ptinfo = linitial(ptinfos); + pinfo = makeNode(PartitionInfo); + pinfo->relid = rti; + pinfo->pd = ptinfo->pd; + partition_infos = list_make1(pinfo); + + /* Let there remain only the child tables' PartitionedTableInfo's */ + ptinfos = list_delete_first(ptinfos); + + /* + * First collect the partitioned child table OIDs. Note that the list + * won't contain the root table's OID because we removed its ptinfo + * from the list above. + */ + i = 1; foreach(l, ptinfos) { PartitionedTableInfo *ptinfo = lfirst(l); + PartitionInfo *pinfo = makeNode(PartitionInfo); inhOIDs = lappend_oid(inhOIDs, ptinfo->relid); + pinfo->relid = rtable_length + i; + pinfo->pd = ptinfo->pd; + partition_infos = lappend(partition_infos, pinfo); + i++; } /* Concatenate the leaf partition OIDs. */ @@ -1487,7 +1534,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) /* Scan the inheritance set and expand it */ appinfos = NIL; - has_child = false; foreach(l, inhOIDs) { Oid childOID = lfirst_oid(l); @@ -1496,23 +1542,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) Index childRTindex; AppendRelInfo *appinfo; - /* Open rel if needed; we already have required locks */ - if (childOID != parentOID) - newrelation = heap_open(childOID, NoLock); - else - newrelation = oldrelation; - /* * It is possible that the parent table has children that are temp * tables of other backends. We cannot safely access such tables * (because of buffering issues), and the best thing to do seems to be * to silently ignore them. */ - if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation)) - { - heap_close(newrelation, lockmode); + if (childOID != parentOID && rel_is_other_temp(childOID)) continue; - } /* * Build an RTE for the child, and attach to query's rangetable list. @@ -1528,7 +1565,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) */ childrte = copyObject(rte); childrte->relid = childOID; - childrte->relkind = newrelation->rd_rel->relkind; + childrte->relkind = get_rel_relkind(childOID); childrte->inh = false; childrte->requiredPerms = 0; childrte->securityQuals = NIL; @@ -1536,51 +1573,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) childRTindex = list_length(parse->rtable); /* - * Build an AppendRelInfo for this parent and child, unless the child - * is a partitioned table. - */ - if (childrte->relkind != RELKIND_PARTITIONED_TABLE) - { - /* Remember if we saw a real child. */ - if (childOID != parentOID) - has_child = true; - - appinfo = makeNode(AppendRelInfo); - appinfo->parent_relid = rti; - appinfo->child_relid = childRTindex; - appinfo->parent_reltype = oldrelation->rd_rel->reltype; - appinfo->child_reltype = newrelation->rd_rel->reltype; - make_inh_translation_list(oldrelation, newrelation, childRTindex, - &appinfo->translated_vars); - appinfo->parent_reloid = parentOID; - appinfos = lappend(appinfos, appinfo); - - /* - * Translate the column permissions bitmaps to the child's attnums - * (we have to build the translated_vars list before we can do - * this). But if this is the parent table, leave copyObject's - * result alone. - * - * Note: we need to do this even though the executor won't run any - * permissions checks on the child RTE. The - * insertedCols/updatedCols bitmaps may be examined for - * trigger-firing purposes. - */ - if (childOID != parentOID) - { - childrte->selectedCols = translate_col_privs(rte->selectedCols, - appinfo->translated_vars); - childrte->insertedCols = translate_col_privs(rte->insertedCols, - appinfo->translated_vars); - childrte->updatedCols = translate_col_privs(rte->updatedCols, - appinfo->translated_vars); - } - } - else - partitioned_child_rels = lappend_int(partitioned_child_rels, - childRTindex); - - /* * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE. */ if (oldrc) @@ -1604,12 +1596,78 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) */ newrc->isParent = (childrte->relkind == RELKIND_PARTITIONED_TABLE); - /* Include child's rowmark type in parent's allMarkTypes */ - oldrc->allMarkTypes |= newrc->allMarkTypes; root->rowMarks = lappend(root->rowMarks, newrc); } + /* + * No need to create AppendRelInfo for partitions at this point, + * because we don't know yet if it will actually be scanned by this + * query. The fact that this is a partition of the parent table + * will be recorded in the PartitionInfo created for the parent + * table. + */ + if (rel_is_partition(childOID) && + childrte->relkind != RELKIND_PARTITIONED_TABLE) + { + LeafPartitionInfo *lpinfo = makeNode(LeafPartitionInfo); + + lpinfo->reloid = childOID; + lpinfo->relid = childRTindex; + leaf_part_infos = lappend(leaf_part_infos, lpinfo); + continue; + } + + if (childrte->relkind == RELKIND_PARTITIONED_TABLE) + { + partitioned_child_rels = lappend_int(partitioned_child_rels, + childRTindex); + continue; + } + + /* + * This must be a non-partitioned child table that is not a partition. + * Build an AppendRelInfo for the same to remember the parent-child + * relationship. + */ + + /* Open rel if needed, we already have required locks */ + if (childOID != parentOID) + newrelation = heap_open(childOID, NoLock); + else + newrelation = oldrelation; + + appinfo = makeNode(AppendRelInfo); + appinfo->parent_relid = rti; + appinfo->child_relid = childRTindex; + appinfo->parent_reltype = oldrelation->rd_rel->reltype; + appinfo->child_reltype = newrelation->rd_rel->reltype; + make_inh_translation_list(oldrelation, newrelation, childRTindex, + &appinfo->translated_vars); + appinfo->parent_reloid = parentOID; + appinfos = lappend(appinfos, appinfo); + + /* + * Translate the column permissions bitmaps to the child's attnums + * (we have to build the translated_vars list before we can do + * this). But if this is the parent table, leave copyObject's + * result alone. + * + * Note: we need to do this even though the executor won't run any + * permissions checks on the child RTE. The + * insertedCols/updatedCols bitmaps may be examined for + * trigger-firing purposes. + */ + if (childOID != parentOID) + { + childrte->selectedCols = translate_col_privs(rte->selectedCols, + appinfo->translated_vars); + childrte->insertedCols = translate_col_privs(rte->insertedCols, + appinfo->translated_vars); + childrte->updatedCols = translate_col_privs(rte->updatedCols, + appinfo->translated_vars); + } + /* Close child relations, but keep locks */ if (childOID != parentOID) heap_close(newrelation, NoLock); @@ -1618,35 +1676,53 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) heap_close(oldrelation, NoLock); /* - * If all the children were temp tables or a partitioned parent did not - * have any leaf partitions, pretend it's a non-inheritance situation; we - * don't need Append node in that case. The duplicate RTE we added for - * the parent table is harmless, so we don't bother to get rid of it; - * ditto for the useless PlanRowMark node. + * We keep a list of objects in root, each of which maps a partitioned + * parent RT index to a bunch of information about the partition tree + * rooted at that parent. The information includes a list of RT indexes + * of partitioned tables appearing in the tree, a list of PartitionInfo + * objects for each such partitioned table, a list of LeafPartitionInfo + * objects for each leaf partition in tree, and finally a list containing + * leaf partition OIDs in an order in which find_all_inheritors() returned + * them. The first of these is used when creating an Append or a + * ModifyTable path for the parent to be copied verbatim into the path + * (and subsequently the plan) so that it could be carried over to the + * executor. That list is the only place where the executor could find + * partitioned child tables to lock them. */ - if (!has_child) + if (rte->relkind == RELKIND_PARTITIONED_TABLE) { - /* Clear flag before returning */ - rte->inh = false; + PartitionRootInfo *prinfo = makeNode(PartitionRootInfo); + + Assert(list_length(partition_infos) >= 1); + prinfo->parent_relid = rti; + /* + * Be sure to include the parent's RT index, because the above code + * didn't. + */ + prinfo->partitioned_relids = lcons_int(rti, partitioned_child_rels); + prinfo->partition_infos = partition_infos; + prinfo->leaf_part_infos = leaf_part_infos; + prinfo->orig_leaf_part_oids = orig_leaf_part_oids; + + root->prinfo_list = lappend(root->prinfo_list, prinfo); + + /* + * Our job here is done, because we didn't create any AppendRelInfos. + */ return; } /* - * We keep a list of objects in root, each of which maps a partitioned - * parent RT index to the list of RT indexes of its partitioned child - * tables. When creating an Append or a ModifyTable path for the parent, - * we copy the child RT index list verbatim to the path so that it could - * be carried over to the executor so that the latter could identify the - * partitioned child tables. + * If all the children were temp tables, pretend it's a non-inheritance + * situation; we don't need Append node in that case. The duplicate + * RTE we added for the parent table is harmless, so we don't bother to + * get rid of it; ditto for the useless PlanRowMark node. */ - if (partitioned_child_rels != NIL) + if (list_length(appinfos) < 2) { - pcinfo = makeNode(PartitionedChildRelInfo); - - Assert(rte->relkind == RELKIND_PARTITIONED_TABLE); - pcinfo->parent_relid = rti; - pcinfo->child_rels = partitioned_child_rels; - root->pcinfo_list = lappend(root->pcinfo_list, pcinfo); + /* Clear flag before returning */ + rte->inh = false; + return; } /* Otherwise, OK to add to root->append_rel_list */ @@ -1767,7 +1843,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, * query is really only going to reference the inherited columns. Instead * we set the per-column bits for all inherited columns. */ -static Bitmapset * +Bitmapset * translate_col_privs(const Bitmapset *parent_privs, List *translated_vars) { diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index a1ebd4acc8..5607a4e4e0 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1577,6 +1577,50 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel) } /* + * build_rel_vars + * + * Returns a list containing Var expressions corresponding to a relation's + * attributes. Since the caller may already have the RangeTblEntry, we it + * pass the same instead of PlannerInfo to avoid finding it in the range + * table all over again. + */ +List * +build_rel_vars(RangeTblEntry *rte, Index relid) +{ + Relation relation; + AttrNumber attrno; + int numattrs; + List *result = NIL; + + Assert(rte->rtekind == RTE_RELATION); + + /* Assume we already have adequate lock */ + relation = heap_open(rte->relid, NoLock); + + numattrs = RelationGetNumberOfAttributes(relation); + for (attrno = 1; attrno <= numattrs; attrno++) + { + Form_pg_attribute att_tup = TupleDescAttr(relation->rd_att, + attrno - 1); + + if (att_tup->attisdropped) + continue; + + result = lappend(result, + makeVar(relid, + attrno, + att_tup->atttypid, + att_tup->atttypmod, + att_tup->attcollation, + 0)); + + } + + heap_close(relation, NoLock); + return result; +} + +/* * build_index_tlist * * Build a targetlist representing the columns of the specified index. diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 8ad0b4a669..4cc32dea8d 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -16,7 +16,9 @@ #include <limits.h> +#include "catalog/pg_class.h" #include "miscadmin.h" +#include "nodes/relation.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" @@ -146,6 +148,15 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) rel->baserestrict_min_security = UINT_MAX; rel->joininfo = NIL; rel->has_eclass_joins = false; + /* Set in build_simple_rel if rel is root partitioned table */ + rel->num_parted = 0; + rel->partition_infos = NULL; + rel->num_leaf_parts = 0; + rel->leaf_part_infos = NULL; + /* Set in get_rel_partitions_recurse */ + rel->live_partition_painfos = NIL; + /* Set in set_append_rel_size if rel is a partition. */ + rel->root_parent_relid = 0; /* * Pass top parent's relids down the inheritance hierarchy. If the parent @@ -210,25 +221,73 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) list_length(rte->securityQuals)); /* - * If this rel is an appendrel parent, recurse to build "other rel" - * RelOptInfos for its children. They are "other rels" because they are - * not in the main join tree, but we will need RelOptInfos to plan access - * to them. + * If this rel is an appendrel parent, generate additional information + * based on whether the parent is a partitioned table or not. For + * regular parent tables, recurse to build "other rel" RelOptInfos for its + * children. They are "other rels" because they are not in the main join + * tree, but we will need RelOptInfos to plan access to them. For + * partitioned parent tables, we do not yet create "other rel" RelOptInfos + * for the children. Instead, we set up some informations that will be + * used in set_append_rel_size() to look up its partitions. */ if (rte->inh) { ListCell *l; - foreach(l, root->append_rel_list) + if (rte->relkind == RELKIND_PARTITIONED_TABLE) { - AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); + PartitionRootInfo *prinfo = NULL; + LeafPartitionInfo **lpinfos; + int i; + + foreach(l, root->prinfo_list) + { + prinfo = lfirst(l); + if (prinfo->parent_relid == relid) + break; + } + Assert(prinfo != NULL && prinfo->parent_relid == relid); + + rel->num_parted = list_length(prinfo->partition_infos); + rel->num_leaf_parts = list_length(prinfo->leaf_part_infos); + rel->partition_infos = (PartitionInfo **) + palloc0(rel->num_parted * + sizeof(PartitionInfo *)); + lpinfos = (LeafPartitionInfo **) palloc0(rel->num_leaf_parts * + sizeof(LeafPartitionInfo *)); + i = 0; + foreach(l, prinfo->partition_infos) + { + rel->partition_infos[i++] = lfirst(l); + } + i = 0; + foreach(l, prinfo->leaf_part_infos) + { + lpinfos[i++] = lfirst(l); + } + rel->leaf_part_infos = lpinfos; - /* append_rel_list contains all append rels; ignore others */ - if (appinfo->parent_relid != relid) - continue; + /* + * Don't build RelOptInfo for partitions yet; we don't know which + * ones we'll need. We did create RangeTblEntry's though, so we + * have an empty slot in root->simple_rel_array that will be + * filled eventually if the respective partition is chosen to be + * scanned after all. + */ + } + else + { + foreach(l, root->append_rel_list) + { + AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid != relid) + continue; - (void) build_simple_rel(root, appinfo->child_relid, - rel); + (void) build_simple_rel(root, appinfo->child_relid, + rel); + } } } diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 82763f8013..ebbc3da985 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -1817,6 +1817,28 @@ get_rel_relkind(Oid relid) } /* + * rel_is_partition + * + * Returns the relkind associated with a given relation. + */ +char +rel_is_partition(Oid relid) +{ + HeapTuple tp; + Form_pg_class reltup; + bool result; + + tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); + if (!HeapTupleIsValid(tp)) + elog(ERROR, "cache lookup failed for relation %u", relid); + reltup = (Form_pg_class) GETSTRUCT(tp); + result = reltup->relispartition; + ReleaseSysCache(tp); + + return result; +} + +/* * get_rel_tablespace * * Returns the pg_tablespace OID associated with a given relation. @@ -1865,6 +1887,34 @@ get_rel_persistence(Oid relid) return result; } +/* + * rel_is_other_temp + * + * Returns whether a relation is a temp table from another session + */ +bool +rel_is_other_temp(Oid relid) +{ + HeapTuple tp; + Form_pg_class reltup; + bool result = false; + + tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); + if (!HeapTupleIsValid(tp)) + elog(ERROR, "cache lookup failed for relation %u", relid); + reltup = (Form_pg_class) GETSTRUCT(tp); + + if (reltup->relpersistence == RELPERSISTENCE_TEMP && + !isTempOrTempToastNamespace(reltup->relnamespace)) + { + result = true; + } + + ReleaseSysCache(tp); + + return result; +} + /* ---------- TRANSFORM CACHE ---------- */ diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 7b53baf847..b5dcb22688 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -16,6 +16,7 @@ #include "fmgr.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" +#include "nodes/relation.h" #include "parser/parse_node.h" #include "utils/rel.h" @@ -87,4 +88,7 @@ extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos, EState *estate, PartitionTupleRoutingInfo **failed_at, TupleTableSlot **failed_slot); + +/* Planner support stuff. */ +extern List *get_partitions_for_keys(PartitionDispatch pd); #endif /* PARTITION_H */ diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 27bd4f3363..e957615ac6 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -260,7 +260,10 @@ typedef enum NodeTag T_PlaceHolderVar, T_SpecialJoinInfo, T_AppendRelInfo, - T_PartitionedChildRelInfo, + T_PartitionInfo, + T_LeafPartitionInfo, + T_PartitionAppendInfo, + T_PartitionRootInfo, T_PlaceHolderInfo, T_MinMaxAggInfo, T_PlannerParamItem, diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 3ccc9d1b03..71c494a7c2 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -251,7 +251,7 @@ typedef struct PlannerInfo List *append_rel_list; /* list of AppendRelInfos */ - List *pcinfo_list; /* list of PartitionedChildRelInfos */ + List *prinfo_list; /* list of PartitionRootInfos */ List *rowMarks; /* list of PlanRowMarks */ @@ -515,6 +515,9 @@ typedef enum RelOptKind /* Is the given relation an "other" relation? */ #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL) +typedef struct PartitionInfo PartitionInfo; +typedef struct LeafPartitionInfo LeafPartitionInfo; + typedef struct RelOptInfo { NodeTag type; @@ -592,6 +595,23 @@ typedef struct RelOptInfo /* used by "other" relations */ Relids top_parent_relids; /* Relids of topmost parents */ + + /* Fields set for "root" partitioned relations */ + int num_parted; /* Number of entries in partition_infos */ + PartitionInfo **partition_infos; + int num_leaf_parts; /* Number of entries in leaf_part_infos */ + LeafPartitionInfo **leaf_part_infos; /* LeafPartitionInfos */ + + /* Fields set for partitioned relations (list of PartitionAppendInfo's) */ + List *live_partition_painfos; + + /* Fields set for partition otherrels */ + + /* + * RT index of the root partitioned table in the the partition tree of + * which this rel is a member. + */ + Index root_parent_relid; } RelOptInfo; /* @@ -2012,24 +2032,73 @@ typedef struct AppendRelInfo Oid parent_reloid; /* OID of parent relation */ } AppendRelInfo; +/* Forward declarations, to avoid including other headers */ +typedef struct PartitionDispatchData *PartitionDispatch; + +/* + * PartitionInfo - information about partitioning of one partitioned table in + * a given partition tree + */ +typedef struct PartitionInfo +{ + NodeTag type; + + Index relid; /* Ordinal position in the rangetable */ + PartitionDispatch pd; /* Information about partitions */ +} PartitionInfo; + +/* + * LeafPartitionInfo - (OID, RT index) pair for one leaf partition + * + * Created when a leaf partition's RT entry is created in + * expand_inherited_rtentry(). + */ +typedef struct LeafPartitionInfo +{ + NodeTag type; + + Oid reloid; /* OID */ + Index relid; /* RT index */ +} LeafPartitionInfo; + /* - * For a partitioned table, this maps its RT index to the list of RT indexes - * of the partitioned child tables in the partition tree. We need to - * separately store this information, because we do not create AppendRelInfos - * for the partitioned child tables of a parent table, since AppendRelInfos - * contain information that is unnecessary for the partitioned child tables. - * The child_rels list must contain at least one element, because the parent - * partitioned table is itself counted as a child. + * PartitionAppendInfo - list of child RT indexes for one partitioned table + * in a given partition tree + */ +typedef struct PartitionAppendInfo +{ + NodeTag type; + + Index parent_relid; + List *live_partition_relids; /* List of RT indexes */ +} PartitionAppendInfo; + +/* + * For a partitioned table, this maps its RT index to the information about + * the partition tree collected in expand_inherited_rtentry(). + * + * That information includes a list of PartitionInfo nodes, one for each + * partitioned table in the partition tree, including for the table itself. + * Also included is a list of RT indexes of the entries for leaf partitions + * that are created at the same time by expand_inherited_rtentry(). + * + * orig_leaf_part_oids contains the list of leaf partition OIDs as it was + * generated by find_all_inheritors(). We keep it around so that we can + * lock leaf partitions in that order when we actually do it. * - * These structs are kept in the PlannerInfo node's pcinfo_list. + * PartitionRootInfo's for different partitioned tables in a query are placed + * in root->prinfo_list. */ -typedef struct PartitionedChildRelInfo +typedef struct PartitionRootInfo { NodeTag type; Index parent_relid; - List *child_rels; -} PartitionedChildRelInfo; + List *partition_infos; + List *partitioned_relids; + List *leaf_part_infos; + List *orig_leaf_part_oids; +} PartitionRootInfo; /* * For each distinct placeholder expression generated during planning, we diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h index 71f0faf938..1e18f609b1 100644 --- a/src/include/optimizer/plancat.h +++ b/src/include/optimizer/plancat.h @@ -39,6 +39,7 @@ extern bool relation_excluded_by_constraints(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); extern List *build_physical_tlist(PlannerInfo *root, RelOptInfo *rel); +extern List *build_rel_vars(RangeTblEntry *rte, Index relid); extern bool has_unique_index(RelOptInfo *rel, AttrNumber attno); diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 4be0afd566..d0af8dc7bc 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -16,6 +16,7 @@ #include "nodes/plannodes.h" #include "nodes/relation.h" +#include "utils/rel.h" /* @@ -51,6 +52,8 @@ extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex); extern RelOptInfo *plan_set_operations(PlannerInfo *root); extern void expand_inherited_tables(PlannerInfo *root); +extern Bitmapset *translate_col_privs(const Bitmapset *parent_privs, + List *translated_vars); extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos, AppendRelInfo **appinfos); diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 07208b56ce..b5b615a6fa 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -126,8 +126,10 @@ extern char *get_rel_name(Oid relid); extern Oid get_rel_namespace(Oid relid); extern Oid get_rel_type_id(Oid relid); extern char get_rel_relkind(Oid relid); +extern bool rel_is_partition(Oid relid); extern Oid get_rel_tablespace(Oid relid); extern char get_rel_persistence(Oid relid); +extern bool rel_is_other_temp(Oid relid); extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes); extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes); extern bool get_typisdefined(Oid typid); diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out index a2d9469592..e159d62b66 100644 --- a/src/test/regress/expected/insert.out +++ b/src/test/regress/expected/insert.out @@ -278,12 +278,12 @@ select tableoid::regclass, * from list_parted; -------------+----+---- part_aa_bb | aA | part_cc_dd | cC | 1 - part_null | | 0 - part_null | | 1 part_ee_ff1 | ff | 1 part_ee_ff1 | EE | 1 part_ee_ff2 | ff | 11 part_ee_ff2 | EE | 10 + part_null | | 0 + part_null | | 1 (8 rows) -- some more tests to exercise tuple-routing with multi-level partitioning -- 2.11.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers