On 2017/09/04 10:10, Amit Langote wrote: > On 2017/09/02 2:52, Robert Haas wrote: >> It strikes me that this patch set is doing two things but maybe in the >> opposite order that I would have chosen to attack them. First, >> there's getting partition pruning to use something other than >> constraint exclusion. Second, there's deferring work that is >> currently done at an early stage of the process until later, so that >> we waste less effort on partitions that are ultimately going to be >> pruned. > > OK. > >> Therefore, IMHO, it would be best to focus first on how we're going to >> identify the partitions that survive pruning, and then afterwards work >> on transposing that logic to happen before partitions are opened and >> locked. That way, we get some incremental benefit sooner, and also >> unblock some other development work. > > Alright, I will try to do it that way.
Attached set of patches that does things that way. Individual patches described below: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner This will allow us to perform scan and join planning in a per partition sub-tree manner, with each sub-tree's root getting its own RelOptInfo. Previously, only the root of the whole partition tree would get a RelOptInfo, along with the leaf partitions, with each leaf partition's AppendRelInfo pointing to the former as its parent. This is essential, because we will be doing partition-pruning for every partitioned table in the tree by matching query's scan keys with its partition key. We won't be able to do that if the intermediate partitioned tables didn't have a RelOptInfo. [PATCH 2/5] WIP: planner-side changes for partition-pruning This patch adds a stub get_partitions_for_keys in partition.c with a suitable interface for the caller to pass bounding keys extracted from the query and other related information. Importantly, it implements the logic within the planner to match query's scan keys to a parent table's partition key and form the bounding keys that will be passed to partition.c to compute the list of partitions that satisfy those bounds. Also, it adds certain fields to RelOptInfos of the partitioned tables that reflect its partitioning properties. [PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch} This guy modifies the partition bound comparison function so that the caller can pass incomplete partition key tuple that is potentially a prefix of a multi-column partition key. Currently, the input tuple must contain all of key->partnatts values, but that may not be true for queries, which may not have restrictions on all the partition key columns. [PATCH 4/5] WIP: Implement get_partitions_for_keys() This one fills the get_partitions_for_keys stub with the actual logic that searches the partdesc->boundinfo for partition bounds that match the bounding keys specified by the caller. [PATCH 5/5] Add more tests for the new partitioning-related planning More tests. Some TODO items still remain to be done: * Process OR clauses to use for partition-pruning * Process redundant clauses (such as a = 10 and a > 1) more smartly * Other tricks that are missing * Fix bugs * More tests Thanks, Amit
From 17dfaff62fe04cf18f5bba298974d42f92b597ef Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 6 Sep 2017 09:28:14 +0900 Subject: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner ...except when the partitioned table in question is the result rel of the query. This allows us perform scan and join planning for each sub-tree in a given partition tree, with each sub-tree's root partitioned table getting its own RelOptInfo. Previously only the root of the whole partition tree got a RelOptInfo, along with the leaf partitions, with each leaf partition's AppendRelInfo pointing to the former as its parent. --- src/backend/optimizer/path/allpaths.c | 34 ++++++- src/backend/optimizer/plan/planner.c | 3 +- src/backend/optimizer/prep/prepunion.c | 166 +++++++++++++++++++++------------ src/backend/optimizer/util/plancat.c | 7 +- 4 files changed, 146 insertions(+), 64 deletions(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 2d7e1d84d0..6c3511bd47 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -1289,11 +1289,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte; rte = planner_rt_fetch(rel->relid, root); + + /* + * Get the partitioned_rels list from root->pcinfo_list after + * confirming that rel is actually a root partitioned table. + */ if (rte->relkind == RELKIND_PARTITIONED_TABLE) { - partitioned_rels = get_partitioned_child_rels(root, rel->relid); - /* The root partitioned table is included as a child rel */ - Assert(list_length(partitioned_rels) >= 1); + int parent_relid; + bool is_root_partitioned_table = false; + + /* + * Normally, only the root partitioned rel will be RELOPT_BASEREL + * in a given partitione tree, except when the root table itself + * is a child in the case of a UNION ALL query. + */ + if (!IS_OTHER_REL(rel)) + is_root_partitioned_table = true; + else if (bms_get_singleton_member(rel->top_parent_relids, + &parent_relid)) + { + RelOptInfo *parent_rel; + + parent_rel = root->simple_rel_array[parent_relid]; + is_root_partitioned_table = + (parent_rel->rtekind != RTE_RELATION); + } + + if (is_root_partitioned_table) + { + partitioned_rels = get_partitioned_child_rels(root, rel->relid); + /* The root partitioned table is included as a child rel */ + Assert(list_length(partitioned_rels) >= 1); + } } /* diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 966230256e..02662fad5d 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -6076,7 +6076,8 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid) * Returns a list of the RT indexes of the partitioned child relations * with rti as the root parent RT index. * - * Note: Only call this function on RTEs known to be partitioned tables. + * Note: Only call this function on RTEs known to be "root" partitioned + * tables. */ List * get_partitioned_child_rels(PlannerInfo *root, Index rti) diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index ccf21453fd..433505948d 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -113,7 +113,8 @@ static void expand_single_inheritance_child(PlannerInfo *root, Index parentRTindex, Relation parentrel, PlanRowMark *parentrc, Relation childrel, bool *has_child, List **appinfos, - List **partitioned_child_rels); + List **partitioned_child_rels, + RangeTblEntry **newrte, Index *newRTindex); static void make_inh_translation_list(Relation oldrelation, Relation newrelation, Index newvarno, @@ -1380,8 +1381,8 @@ expand_inherited_tables(PlannerInfo *root) * regular inheritance, a parent RTE must always have at least two associated * AppendRelInfos: one corresponding to the parent table as a simple member of * inheritance set and one or more corresponding to the actual children. - * Since a partitioned table is not scanned, it might have only one associated - * AppendRelInfo. + * Since a partitioned table parent is itself not scanned, it might have only + * one associated AppendRelInfo. */ static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) @@ -1473,13 +1474,10 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) { /* * If this table has partitions, recursively expand them in the order - * in which they appear in the PartitionDesc. But first, expand the - * parent itself. + * in which they appear in the PartitionDesc. Also, start collecting + * the RT indexes of the partitioned tables in the partition tree. */ - expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc, - oldrelation, - &has_child, &appinfos, - &partitioned_child_rels); + partitioned_child_rels = list_make1_int(rti); expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc, RelationGetPartitionDesc(oldrelation), lockmode, @@ -1516,10 +1514,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) continue; } + /* + * Don't expect to find any partitioned tables in a regular + * inheritance tree, so pass NULL for partitioned_child_rels here. + */ expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc, newrelation, &has_child, &appinfos, - &partitioned_child_rels); + NULL, NULL, NULL); /* Close child relations, but keep locks */ if (childOID != parentOID) @@ -1581,6 +1583,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, { Oid childOID = partdesc->oids[i]; Relation childrel; + RangeTblEntry *childrte; + Index childRTindex; /* Open rel; we already have required locks */ childrel = heap_open(childOID, NoLock); @@ -1592,19 +1596,60 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, continue; } - expand_single_inheritance_child(root, parentrte, parentRTindex, + expand_single_inheritance_child(root, + parentrte, parentRTindex, parentrel, parentrc, childrel, has_child, appinfos, - partitioned_child_rels); + partitioned_child_rels, + &childrte, &childRTindex); /* If this child is itself partitioned, recurse */ if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - expand_partitioned_rtentry(root, parentrte, parentRTindex, - parentrel, parentrc, - RelationGetPartitionDesc(childrel), - lockmode, - has_child, appinfos, - partitioned_child_rels); + { + RangeTblEntry *new_parentrte; + Index new_parentRTindex; + Relation new_parentrel; + + /* + * For SELECT queries, it's desirable to perform scan and join + * planning on the individual partition sub-trees, instead of + * doing the same on the whole tree at once. This allows to apply + * techniques such as parition-pruning and/or partition-wise join + * on the individual partition sub-trees. For that to happen, + * root parent of each sub-tree must get an RTE with inh set to + * true, which must be already taken care of by + * expand_single_inheritance_child(). Next, for each of the + * children, we must record immediate parent as its parent in the + * the child AppendRelInfo, instead of the root parent of the + * whole tree. + * + * If parent is the query's result relation, inheritance_planner() + * will expand the inheritance so as to apply the *whole* query to + * each leaf partition, which means we cannot apply + * partition-pruning and/or partition-wise join to a partitioned + * result relation, meaning there is not much point in expanding + * the tree hierarchically. + */ + if (parentRTindex == root->parse->resultRelation) + { + new_parentrte = parentrte; + new_parentRTindex = parentRTindex; + new_parentrel = parentrel; + } + else + { + new_parentrte = childrte; + new_parentRTindex = childRTindex; + new_parentrel = childrel; + } + + expand_partitioned_rtentry(root, new_parentrte, new_parentRTindex, + new_parentrel, parentrc, + RelationGetPartitionDesc(childrel), + lockmode, + has_child, appinfos, + partitioned_child_rels); + } /* Close child relation, but keep locks */ heap_close(childrel, NoLock); @@ -1619,13 +1664,17 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, * anything at all. Otherwise, we'll set "has_child" to true, build a * RangeTblEntry and either a PartitionedChildRelInfo or AppendRelInfo as * appropriate, plus maybe a PlanRowMark. + * + * The newly created RT entry and its RT index are returned in *newrte and + * *newRTindex, respectively. */ static void expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, Index parentRTindex, Relation parentrel, PlanRowMark *parentrc, Relation childrel, bool *has_child, List **appinfos, - List **partitioned_child_rels) + List **partitioned_child_rels, + RangeTblEntry **newrte, Index *newRTindex) { Query *parse = root->parse; Oid parentOID = RelationGetRelid(parentrel); @@ -1649,54 +1698,46 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, childrte = copyObject(parentrte); childrte->relid = childOID; childrte->relkind = childrel->rd_rel->relkind; - childrte->inh = false; + childrte->inh = (childrte->relkind == RELKIND_PARTITIONED_TABLE); childrte->requiredPerms = 0; childrte->securityQuals = NIL; parse->rtable = lappend(parse->rtable, childrte); childRTindex = list_length(parse->rtable); + /* Build an AppendRelInfo for this parent and child. */ + + /* Remember if we saw a real child. */ + if (childOID != parentOID) + *has_child = true; + + appinfo = makeNode(AppendRelInfo); + appinfo->parent_relid = parentRTindex; + appinfo->child_relid = childRTindex; + appinfo->parent_reltype = parentrel->rd_rel->reltype; + appinfo->child_reltype = childrel->rd_rel->reltype; + make_inh_translation_list(parentrel, childrel, childRTindex, + &appinfo->translated_vars); + appinfo->parent_reloid = parentOID; + *appinfos = lappend(*appinfos, appinfo); + /* - * Build an AppendRelInfo for this parent and child, unless the child is a - * partitioned table. + * Translate the column permissions bitmaps to the child's attnums (we + * have to build the translated_vars list before we can do this). But + * if this is the parent table, leave copyObject's result alone. + * + * Note: we need to do this even though the executor won't run any + * permissions checks on the child RTE. The insertedCols/updatedCols + * bitmaps may be examined for trigger-firing purposes. */ - if (childrte->relkind != RELKIND_PARTITIONED_TABLE) + if (childOID != parentOID) { - /* Remember if we saw a real child. */ - if (childOID != parentOID) - *has_child = true; - - appinfo = makeNode(AppendRelInfo); - appinfo->parent_relid = parentRTindex; - appinfo->child_relid = childRTindex; - appinfo->parent_reltype = parentrel->rd_rel->reltype; - appinfo->child_reltype = childrel->rd_rel->reltype; - make_inh_translation_list(parentrel, childrel, childRTindex, - &appinfo->translated_vars); - appinfo->parent_reloid = parentOID; - *appinfos = lappend(*appinfos, appinfo); - - /* - * Translate the column permissions bitmaps to the child's attnums (we - * have to build the translated_vars list before we can do this). But - * if this is the parent table, leave copyObject's result alone. - * - * Note: we need to do this even though the executor won't run any - * permissions checks on the child RTE. The insertedCols/updatedCols - * bitmaps may be examined for trigger-firing purposes. - */ - if (childOID != parentOID) - { - childrte->selectedCols = translate_col_privs(parentrte->selectedCols, - appinfo->translated_vars); - childrte->insertedCols = translate_col_privs(parentrte->insertedCols, - appinfo->translated_vars); - childrte->updatedCols = translate_col_privs(parentrte->updatedCols, - appinfo->translated_vars); - } + childrte->selectedCols = translate_col_privs(parentrte->selectedCols, + appinfo->translated_vars); + childrte->insertedCols = translate_col_privs(parentrte->insertedCols, + appinfo->translated_vars); + childrte->updatedCols = translate_col_privs(parentrte->updatedCols, + appinfo->translated_vars); } - else - *partitioned_child_rels = lappend_int(*partitioned_child_rels, - childRTindex); /* * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE. @@ -1726,6 +1767,15 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, root->rowMarks = lappend(root->rowMarks, childrc); } + + if (partitioned_child_rels && + childrte->relkind == RELKIND_PARTITIONED_TABLE) + *partitioned_child_rels = lappend_int(*partitioned_child_rels, + childRTindex); + if (newrte) + *newrte = childrte; + if (newRTindex) + *newRTindex = childRTindex; } /* diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index a1ebd4acc8..bfc05a1af5 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1402,8 +1402,11 @@ relation_excluded_by_constraints(PlannerInfo *root, if (predicate_refuted_by(safe_restrictions, safe_restrictions, false)) return true; - /* Only plain relations have constraints */ - if (rte->rtekind != RTE_RELATION || rte->inh) + /* + * Only plain relations have constraints. In addition, there can be + * inheritance parent RTEs that are themselves partitions. + */ + if (rte->rtekind != RTE_RELATION || (rte->inh && !IS_OTHER_REL(rel))) return false; /* -- 2.11.0
From 0f1d944c23c0f9170ffe8553ef2d22754fa3aab7 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 17:31:42 +0900 Subject: [PATCH 2/5] WIP: planner-side changes for partition-pruning Firstly, this adds a stub get_partitions_for_keys() in partition.c with appropriate interface for the caller to specify bounding scan keys, along with other information about the scan keys extracted from the query, such as NULL-ness of the keys, inclusive-ness, etc. More importantly, this implements the planner-side logic to extract bounding scan keys to be passed to get_partitions_for_keys. That is, it will go through rel->baserestrictinfo and match individual clauses to partition keys and construct lower bound and upper bound tuples, which may cover only a prefix of a multi-column partition key. A bunch of smarts are still missing when mapping the clause operands with keys. For example, code to match a clause is specifed as (constant op var) doesn't exist. Also, redundant keys are not eliminated, for example, a combination of clauses a = 10 and a > 1 will cause the later clause a > 1 taking over and resulting in needless scanning of partitions containing values a > 1 and a < 10. ...constraint exclusion is still used, because get_partitions_for_keys is just a stub... --- src/backend/catalog/partition.c | 42 +++++ src/backend/optimizer/path/allpaths.c | 308 +++++++++++++++++++++++++++++----- src/backend/optimizer/util/plancat.c | 120 +++++++++++++ src/backend/optimizer/util/relnode.c | 20 +++ src/include/catalog/partition.h | 8 + src/include/nodes/nodes.h | 1 + src/include/nodes/relation.h | 135 +++++++++++++++ src/include/optimizer/plancat.h | 2 + 8 files changed, 595 insertions(+), 41 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 50162632f5..bb3009e5b3 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1138,6 +1138,48 @@ RelationGetPartitionDispatchInfo(Relation rel, return pd; } +/* + * get_partitions_for_keys + * Returns the list of indexes of rel's partitions that will need to be + * scanned given the bounding scan keys. + * + * Each value in the returned list can be used as an index into the oids array + * of the partition descriptor. + * + * Inputs: + * keynullness contains between 0 and (key->partnatts - 1) values, each + * telling what kind of NullTest has been applies to the corresponding + * partition key column. minkeys represents the lower bound on the partition + * the key of the records that the query will return, while maxkeys + * represents upper bound. min_inclusive and max_inclusive tell whether the + * bounds specified minkeys and maxkeys is inclusive, respectively. + * + * Other outputs: + * *min_datum_index will return the index in boundinfo->datums of the first + * datum that the query's bounding keys allow to be returned for the query. + * Similarly, *max_datum_index. *null_partition_chosen returns whether + * the null partition will be scanned. + * + * TODO: Implement. + */ +List * +get_partitions_for_keys(Relation rel, + NullTestType *keynullness, + Datum *minkeys, int n_minkeys, bool min_inclusive, + Datum *maxkeys, int n_maxkeys, bool max_inclusive, + int *min_datum_index, int *max_datum_index, + bool *null_partition_chosen) +{ + List *result = NIL; + int i; + PartitionDesc partdesc = RelationGetPartitionDesc(rel); + + for (i = 0; i < partdesc->nparts; i++) + result = lappend_int(result, i); + + return result; +} + /* Module-local functions */ /* diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 6c3511bd47..97af646242 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -20,6 +20,7 @@ #include "access/sysattr.h" #include "access/tsmapi.h" +#include "catalog/partition.h" #include "catalog/pg_class.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -845,6 +846,222 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) } /* + * get_rel_partitions + * Return the list of partitions of rel that pass the query clauses + * + * Returned list contains the AppendInfos of the chosen partitions. + */ +static List * +get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) +{ + Relation parent = heap_open(rte->relid, NoLock); + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + List *indexes; + List *result = NIL; + ListCell *lc1, + *lc2; + int keyPos; + List *matchedclauses[PARTITION_MAX_KEYS]; + NullTestType keynullness[PARTITION_MAX_KEYS]; + Datum minkeys[PARTITION_MAX_KEYS], + maxkeys[PARTITION_MAX_KEYS]; + bool need_next_min, + need_next_max, + minkey_set[PARTITION_MAX_KEYS], + maxkey_set[PARTITION_MAX_KEYS], + min_incl, + max_incl; + int n_minkeys = 0, + n_maxkeys = 0, + i; + + /* + * Match individual OpExprs in the query's restriction with individual + * partition key columns. There is one list per key. + */ + memset(keynullness, -1, sizeof(keynullness)); + memset(matchedclauses, 0, sizeof(matchedclauses)); + keyPos = 0; + for (i = 0; i < rel->part_scheme->partnatts; i++) + { + Node *partkey = linitial(rel->partexprs[i]); + + foreach(lc2, rel->baserestrictinfo) + { + RestrictInfo *rinfo = lfirst(lc2); + Expr *clause = rinfo->clause; + + if (is_opclause(clause)) + { + Node *leftop = get_leftop(clause); + + if (IsA(leftop, RelabelType)) + leftop = (Node *) ((RelabelType *) leftop)->arg; + + if (equal(leftop, partkey)) + { + matchedclauses[keyPos] = lappend(matchedclauses[keyPos], + clause); + /* A strict operator implies NOT NULL argument. */ + keynullness[keyPos] = IS_NOT_NULL; + } + } + else if (IsA(clause, NullTest)) + { + NullTest *nulltest = (NullTest *) clause; + Node *arg = (Node *) nulltest->arg; + + if (equal(arg, partkey)) + keynullness[keyPos] = nulltest->nulltesttype; + } + } + + /* Onto finding clauses matching the next partition key. */ + keyPos++; + } + + /* + * Determine the min keys and the max keys using btree semantics-based + * interpretation of the clauses' operators. + */ + + /* + * XXX - There should be a step similar to _bt_preprocess_keys() here, + * to eliminate any redundant scan keys for a given partition column. For + * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being + * more restrictive and discard a <= 5. While doing that, we can also + * check to see if there exists a contradictory combination of scan keys + * that makes the query trivially false for all records in the table. + */ + memset(minkeys, 0, sizeof(minkeys)); + memset(maxkeys, 0, sizeof(maxkeys)); + memset(minkey_set, false, sizeof(minkey_set)); + memset(maxkey_set, false, sizeof(maxkey_set)); + need_next_min = true; + need_next_max = true; + for (i = 0; i < rel->part_scheme->partnatts; i++) + { + /* + * If no scan key existed for the previous column, we are done. + */ + if (i > n_minkeys) + need_next_min = false; + + if (i > n_maxkeys) + need_next_max = false; + + foreach(lc1, matchedclauses[i]) + { + Expr *clause = lfirst(lc1); + Const *rightop = (Const *) get_rightop(clause); + Oid opno = ((OpExpr *) clause)->opno, + opfamily = rel->part_scheme->partopfamily[i]; + StrategyNumber strategy; + + strategy = get_op_opfamily_strategy(opno, opfamily); + switch (strategy) + { + case BTLessStrategyNumber: + case BTLessEqualStrategyNumber: + if (need_next_max) + { + maxkeys[i] = rightop->constvalue; + if (!maxkey_set[i]) + n_maxkeys++; + maxkey_set[i] = true; + max_incl = (strategy == BTLessEqualStrategyNumber); + } + if (strategy == BTLessStrategyNumber) + need_next_max = false; + break; + + case BTGreaterStrategyNumber: + case BTGreaterEqualStrategyNumber: + if (need_next_min) + { + minkeys[i] = rightop->constvalue; + if (!minkey_set[i]) + n_minkeys++; + minkey_set[i] = true; + min_incl = (strategy == BTGreaterEqualStrategyNumber); + } + if (strategy == BTGreaterStrategyNumber) + need_next_min = false; + break; + + case BTEqualStrategyNumber: + if (need_next_min) + { + minkeys[i] = rightop->constvalue; + if (!minkey_set[i]) + n_minkeys++; + } + minkey_set[i] = true; + min_incl = true; + + if (need_next_max) + { + maxkeys[i] = rightop->constvalue; + if (!maxkey_set[i]) + n_maxkeys++; + } + maxkey_set[i] = true; + max_incl = true; + break; + + /* + * This might mean '<>', but we don't have anything for that + * case yet. Perhaps, handle that as key < const OR + * key > const, once we have props needed for handling OR + * clauses. + */ + default: + min_incl = max_incl = false; + break; + } + } + } + + /* Ask partition.c which partitions it thinks match the keys. */ + indexes = get_partitions_for_keys(parent, keynullness, + minkeys, n_minkeys, min_incl, + maxkeys, n_maxkeys, max_incl, + &rel->painfo->min_datum_idx, + &rel->painfo->max_datum_idx, + &rel->painfo->contains_null_partition); + + if (indexes != NIL) + { +#ifdef USE_ASSERT_CHECKING + int first_index, + last_index; + first_index = linitial_int(indexes); + last_index = llast_int(indexes); + Assert(first_index <= last_index || + rel->part_scheme->strategy != PARTITION_STRATEGY_RANGE); +#endif + + foreach(lc1, indexes) + { + int partidx = lfirst_int(lc1); + AppendRelInfo *appinfo = rel->child_appinfos[partidx]; +#ifdef USE_ASSERT_CHECKING + RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root); + Assert(partdesc->oids[partidx] == rte->relid); +#endif + result = lappend(result, appinfo); + } + } + + /* Remember for future users such as set_append_rel_pathlist(). */ + rel->painfo->live_partition_appinfos = result; + + heap_close(parent, NoLock); + + return result; +} + +/* * set_append_rel_size * Set size estimates for a simple "append relation" * @@ -859,6 +1076,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte) { + List *rel_appinfos = NIL; int parentRTindex = rti; bool has_live_children; double parent_rows; @@ -869,6 +1087,24 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Assert(IS_SIMPLE_REL(rel)); + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach (l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + { + rel_appinfos = get_rel_partitions(root, rel, rte); + Assert(rel->painfo != NULL); + rel->painfo->live_partitioned_rels = list_make1_int(rti); + } + /* * Initialize to compute size estimates for whole append relation. * @@ -889,7 +1125,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, nattrs = rel->max_attr - rel->min_attr + 1; parent_attrsizes = (double *) palloc0(nattrs * sizeof(double)); - foreach(l, root->append_rel_list) + foreach(l, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); int childRTindex; @@ -902,10 +1138,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, ListCell *childvars; ListCell *lc; - /* append_rel_list contains all append rels; ignore others */ - if (appinfo->parent_relid != parentRTindex) - continue; - childRTindex = appinfo->child_relid; childRTE = root->simple_rte_array[childRTindex]; @@ -1114,6 +1346,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, has_live_children = true; /* + * If childrel is itself partitioned, add it and its partitioned + * children to the list being propagated up to the root rel. + */ + if (childrel->painfo && rel->painfo) + { + rel->painfo->live_partitioned_rels = + list_concat(rel->painfo->live_partitioned_rels, + list_copy(childrel->painfo->live_partitioned_rels)); + } + + /* * If any live child is not parallel-safe, treat the whole appendrel * as not parallel-safe. In future we might be able to generate plans * in which some children are farmed out to workers while others are @@ -1209,14 +1452,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte) { int parentRTindex = rti; - List *live_childrels = NIL; + List *rel_appinfos = NIL, + *live_childrels = NIL; ListCell *l; + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach (l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + rel_appinfos = rel->painfo->live_partition_appinfos; + /* * Generate access paths for each member relation, and remember the * non-dummy children. */ - foreach(l, root->append_rel_list) + foreach(l, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); int childRTindex; @@ -1289,40 +1547,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte; rte = planner_rt_fetch(rel->relid, root); - - /* - * Get the partitioned_rels list from root->pcinfo_list after - * confirming that rel is actually a root partitioned table. - */ - if (rte->relkind == RELKIND_PARTITIONED_TABLE) - { - int parent_relid; - bool is_root_partitioned_table = false; - - /* - * Normally, only the root partitioned rel will be RELOPT_BASEREL - * in a given partitione tree, except when the root table itself - * is a child in the case of a UNION ALL query. - */ - if (!IS_OTHER_REL(rel)) - is_root_partitioned_table = true; - else if (bms_get_singleton_member(rel->top_parent_relids, - &parent_relid)) - { - RelOptInfo *parent_rel; - - parent_rel = root->simple_rel_array[parent_relid]; - is_root_partitioned_table = - (parent_rel->rtekind != RTE_RELATION); - } - - if (is_root_partitioned_table) - { - partitioned_rels = get_partitioned_child_rels(root, rel->relid); - /* The root partitioned table is included as a child rel */ - Assert(list_length(partitioned_rels) >= 1); - } - } + if (rte->relkind == RELKIND_PARTITIONED_TABLE && IS_SIMPLE_REL(rel)) + partitioned_rels = rel->painfo->live_partitioned_rels; /* * For every non-dummy child, remember the cheapest path. Also, identify diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index bfc05a1af5..de50b5d86a 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -68,6 +68,8 @@ static List *get_relation_constraints(PlannerInfo *root, static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index, Relation heapRelation); static List *get_relation_statistics(RelOptInfo *rel, Relation relation); +static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, + Relation relation); /* * get_relation_info - @@ -420,6 +422,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Collect info about relation's foreign keys, if relevant */ get_relation_foreign_keys(root, rel, relation, inhparent); + /* Collect partitioning info, if relevant. */ + if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + get_relation_partition_info(root, rel, relation); + heap_close(relation, NoLock); /* @@ -1805,3 +1811,117 @@ has_row_triggers(PlannerInfo *root, Index rti, CmdType event) heap_close(relation, NoLock); return result; } + +static void +get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, + Relation relation) +{ + int i; + ListCell *l; + PartitionKey key = RelationGetPartitionKey(relation); + PartitionDesc partdesc = RelationGetPartitionDesc(relation); + + rel->part_scheme = find_partition_scheme(root, relation); + rel->partexprs = (List **) palloc0(key->partnatts * sizeof(List *)); + + l = list_head(key->partexprs); + for (i = 0; i < key->partnatts; i++) + { + Expr *keyCol; + + if (key->partattrs[i] != 0) + { + keyCol = (Expr *) makeVar(rel->relid, + key->partattrs[i], + key->parttypid[i], + key->parttypmod[i], + key->parttypcoll[i], + 0); + } + else + { + if (l == NULL) + elog(ERROR, "wrong number of partition key expressions"); + keyCol = (Expr *) copyObject(lfirst(l)); + l = lnext(l); + } + + rel->partexprs[i] = list_make1(keyCol); + } + + /* Values are filled in build_simple_rel(). */ + rel->child_appinfos = (AppendRelInfo **) palloc0(partdesc->nparts * + sizeof(AppendRelInfo *)); + + /* + * A PartitionAppendInfo to map this table to its immediate partitions + * that will be scanned by this query. At the same time, it records the + * table's partitioning properties reflecting any partition-pruning that + * might occur to satisfy the query. Rest of the fields are set in + * get_rel_partitions() and set_append_rel_size(). + */ + rel->painfo = makeNode(PartitionAppendInfo); + rel->painfo->boundinfo = partdesc->boundinfo; +} + +/* + * find_partition_scheme + * + * The function returns a canonical partition scheme which exactly matches the + * partitioning scheme of the given relation if one exists in the list of + * canonical partitioning schemes maintained in PlannerInfo. If none of the + * existing partitioning schemes match, the function creates a canonical + * partition scheme and adds it to the list. + * + * For an unpartitioned table or for a multi-level partitioned table it returns + * NULL. See comments in the function for more details. + */ +PartitionScheme +find_partition_scheme(PlannerInfo *root, Relation relation) +{ + ListCell *lc; + PartitionKey key = RelationGetPartitionKey(relation); + char strategy = key->strategy; + int partnatts = key->partnatts; + PartitionScheme part_scheme = NULL; + + /* Search for a matching partition scheme and return if found one. */ + foreach(lc, root->partition_schemes) + { + part_scheme = lfirst(lc); + + /* Match various partitioning attributes. */ + if (strategy != part_scheme->strategy || + partnatts != part_scheme->partnatts || + memcmp(key->parttypid, part_scheme->parttypid, + sizeof(Oid) * partnatts) != 0 || + memcmp(key->parttypmod, part_scheme->parttypmod, + sizeof(int32) * partnatts) != 0 || + memcmp(key->partcollation, part_scheme->partcollation, + sizeof(Oid) * partnatts) != 0 || + memcmp(key->partopfamily, part_scheme->partopfamily, + sizeof(Oid) * partnatts) != 0 || + memcmp(key->partopcintype, part_scheme->partopcintype, + sizeof(Oid) * partnatts) != 0) + continue; + + /* Found a matching partition scheme. */ + return part_scheme; + } + + /* Did not find matching partition scheme. Create one. */ + part_scheme = (PartitionScheme) palloc0(sizeof(PartitionSchemeData)); + + part_scheme->strategy = strategy; + part_scheme->partnatts = partnatts; + part_scheme->parttypid = key->parttypid; + part_scheme->parttypmod = key->parttypmod; + part_scheme->partcollation = key->partcollation; + part_scheme->partopfamily = key->partopfamily; + part_scheme->partopcintype = key->partopcintype; + + /* Add the partitioning scheme to PlannerInfo. */ + root->partition_schemes = lappend(root->partition_schemes, part_scheme); + + return part_scheme; +} diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 8ad0b4a669..390d3b4956 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -17,6 +17,7 @@ #include <limits.h> #include "miscadmin.h" +#include "catalog/pg_class.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" @@ -163,6 +164,11 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) else rel->top_parent_relids = NULL; + rel->child_appinfos = NULL; + rel->part_scheme = NULL; + rel->partexprs = NULL; + rel->painfo = NULL; + /* Check type of rtable entry */ switch (rte->rtekind) { @@ -218,7 +224,18 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) if (rte->inh) { ListCell *l; + AppendRelInfo **child_appinfos = NULL; + int i; + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { + Assert(rel->part_scheme != NULL); + Assert(rel->child_appinfos != NULL); + Assert(rel->painfo != NULL); + child_appinfos = rel->child_appinfos; + } + + i = 0; foreach(l, root->append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); @@ -229,6 +246,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) (void) build_simple_rel(root, appinfo->child_relid, rel); + + if (child_appinfos) + child_appinfos[i++] = appinfo; } } diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 2283c675e9..fd16494909 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -99,4 +99,12 @@ extern int get_partition_for_tuple(PartitionDispatch *pd, EState *estate, PartitionDispatchData **failed_at, TupleTableSlot **failed_slot); + +/* Planner support stuff. */ +extern List *get_partitions_for_keys(Relation rel, + NullTestType *keynullness, + Datum *minkeys, int n_minkeys, bool min_inclusive, + Datum *maxkeys, int n_maxkeys, bool max_inclusive, + int *min_datum_index, int *max_datum_index, + bool *null_partition_chosen); #endif /* PARTITION_H */ diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 27bd4f3363..63196a1211 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -261,6 +261,7 @@ typedef enum NodeTag T_SpecialJoinInfo, T_AppendRelInfo, T_PartitionedChildRelInfo, + T_PartitionAppendInfo, T_PlaceHolderInfo, T_MinMaxAggInfo, T_PlannerParamItem, diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index a39e59d8ac..2b535984a7 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -266,6 +266,8 @@ typedef struct PlannerInfo List *distinct_pathkeys; /* distinctClause pathkeys, if any */ List *sort_pathkeys; /* sortClause pathkeys, if any */ + List *partition_schemes; /* List of PartitionScheme objects. */ + List *initial_rels; /* RelOptInfos we are now trying to join */ /* Use fetch_upper_rel() to get any particular upper rel */ @@ -326,6 +328,48 @@ typedef struct PlannerInfo ((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \ rt_fetch(rti, (root)->parse->rtable)) +/* + * Partitioning scheme + * Structure to hold partitioning scheme for a given relation. + * + * Multiple relations may be partitioned in the same way. The relations + * resulting from joining such relations may be partitioned in the same way as + * the joining relations. Similarly, relations derived from such relations by + * grouping, sorting may be partitioned in the same way as the underlying scan + * relations. All such relations partitioned in the same way share the + * partitioning scheme. + * + * PlannerInfo stores a list of distinct "canonical" partitioning schemes. + * RelOptInfo of a partitioned relation holds the pointer to "canonical" + * partitioning scheme. + * + * We store opclass declared input data types instead of partition key + * datatypes since those are the ones used to compare partition bounds instead + * of actual partition key data types. Since partition key data types and the + * opclass declared input data types are expected to be binary compatible (per + * ResolveOpClass()), both of those should have same byval and length + * properties. + * + * The structure caches information about partition key data type to be used + * while matching partition bounds. While comparing partition schemes we don't + * need to compare this information as it should be same when opclass declared + * input data types are same for two partitioned relations. + */ +typedef struct PartitionSchemeData +{ + char strategy; /* Partitioning strategy */ + int16 partnatts; /* Number of partitioning attributes */ + + /* The following arrays each have partnatts members. */ + Oid *parttypid; /* Type OIDs */ + int32 *parttypmod; /* Typemod values */ + Oid *partcollation; /* Partitioning collation */ + Oid *partopfamily; /* Operator family OIDs */ + Oid *partopcintype; /* Operator class-declared input type OIDs */ +} PartitionSchemeData; + +typedef struct PartitionSchemeData *PartitionScheme; + /*---------- * RelOptInfo @@ -515,6 +559,9 @@ typedef enum RelOptKind /* Is the given relation an "other" relation? */ #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL) +typedef struct AppendRelInfo AppendRelInfo; +typedef struct PartitionAppendInfo PartitionAppendInfo; + typedef struct RelOptInfo { NodeTag type; @@ -592,6 +639,48 @@ typedef struct RelOptInfo /* used by "other" relations */ Relids top_parent_relids; /* Relids of topmost parents */ + + /* Fields set for partitioned relations */ + + /* + * Information about the partitioning attributes, such as the number of + * attributes, arrays containing per-attribute type/tpymod, partitioning + * collation, operator family OIDs, etc. + */ + PartitionScheme part_scheme; + + /* + * Following contains the exact identities of the individual partitioning + * attributes. For example, if the attribute is a table's column, then + * it will be represented herein by a Var node for the same. This is + * structured as an array of Lists with part_scheme->partnatts members, + * with each list containing the expression(s) corresponding to the ith + * partitioning attribute (0 <= i < part_schem->partnatts) of this rel. + * For baserels, there is just a single expression in each slot (the ith + * list) of the array, because it corresponds to just one table. But for + * a joinrel, there will be as many expressions as there are tables + * involved in that joinrel. We have to do it that way, because in the + * joinrel case, the same corresponding partitioning attribute may have + * different identities in different tables involved in the join; for + * example, a Var node's varno will differ and so might varattnos. + */ + List **partexprs; + + /* AppendRelInfos of *all* partitions of the table. */ + AppendRelInfo **child_appinfos; + + /* + * For a partitioned relation, the following represents the identities + * of its live partition (their RT indexes) and some informations about + * the bounds that the live partitions satisfy. + */ + PartitionAppendInfo *painfo; + + /* + * RT index of the root partitioned table in the the partition tree of + * which this rel is a member. + */ + Index root_parent_relid; } RelOptInfo; /* @@ -2031,6 +2120,52 @@ typedef struct PartitionedChildRelInfo List *child_rels; } PartitionedChildRelInfo; +/* Forward declarations, to avoid including other headers */ +typedef struct PartitionDispatchData *PartitionDispatch; +typedef struct PartitionBoundInfoData *PartitionBoundInfo; +typedef struct PartitionKeyData *PartitionKey; + +/* + * PartitionAppendInfo - Properties of partitions contained in the Append path + * of a given partitioned table + */ +typedef struct PartitionAppendInfo +{ + NodeTag type; + + /* + * List of AppendRelInfos of the table's partitions that satisfy a given + * query. + */ + List *live_partition_appinfos; + + /* + * RT indexes of live partitions that are partitioned tables themselves. + * This includes the RT index of the table itself. + */ + List *live_partitioned_rels; + + /* + * The following simply copies the pointer to boundinfo in the table's + * PartitionDesc. + */ + PartitionBoundInfo boundinfo; + + /* + * Indexes in the boundinfo->datums array of the smallest and the largest + * value of the partition key that the query allows. They are set by + * calling get_partitions_for_keys(). + */ + int min_datum_idx; + int max_datum_idx; + + /* + * Does this Append contain the null-accepting partition, if one exists + * and is allowed by the query's quals. + */ + bool contains_null_partition; +} PartitionAppendInfo; + /* * For each distinct placeholder expression generated during planning, we * store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list. diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h index 71f0faf938..c45db074c6 100644 --- a/src/include/optimizer/plancat.h +++ b/src/include/optimizer/plancat.h @@ -56,5 +56,7 @@ extern Selectivity join_selectivity(PlannerInfo *root, SpecialJoinInfo *sjinfo); extern bool has_row_triggers(PlannerInfo *root, Index rti, CmdType event); +extern PartitionScheme find_partition_scheme(PlannerInfo *root, + Relation relation); #endif /* PLANCAT_H */ -- 2.11.0
From 1921fc38dee9bd89f42f89012dd1d57ead4dc951 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 11:33:27 +0900 Subject: [PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch} Introduces a notion of PartitionBoundCmpArg, which replaces the set of arguments void *probe and bool probe_is_bound of both partition_bound_cmp and partition_bound_bsearch. It wasn't possible before to specify the number of datums when a non-bound type of probe is passed. Slightly tweaking the existing interface to allow specifying the same seems awkward. So, instead encapsulate that into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp to compare caller-specifed number of datums, instead of key->partnatts datums. --- src/backend/catalog/partition.c | 123 +++++++++++++++++++++++++++++----------- 1 file changed, 90 insertions(+), 33 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index bb3009e5b3..14876f8ea3 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -105,6 +105,30 @@ typedef struct PartitionRangeBound bool lower; /* this is the lower (vs upper) bound */ } PartitionRangeBound; +/* + * PartitionBoundCmpArg - Caller-defined argument to be passed to + * partition_bound_cmp() + * + * The first (fixed) argument involved in a comparison is the user-defined + * partition bound of a given existing partition, while an instance of the + * following struct describes either a new partition bound being compared + * against existing bounds (is_bound is true in that case and either lbound + * or rbound is set), or a new tuple's partition key specified in datums + * (ndatums = number of partition key columns). + */ +typedef struct PartitionBoundCmpArg +{ + bool is_bound; + union + { + PartitionListValue *lbound; + PartitionRangeBound *rbound; + } bound; + + Datum *datums; + int ndatums; +} PartitionBoundCmpArg; + static int32 qsort_partition_list_value_cmp(const void *a, const void *b, void *arg); static int32 qsort_partition_rbound_cmp(const void *a, const void *b, @@ -131,14 +155,15 @@ static int32 partition_rbound_cmp(PartitionKey key, bool lower1, PartitionRangeBound *b2); static int32 partition_rbound_datum_cmp(PartitionKey key, Datum *rb_datums, PartitionRangeDatumKind *rb_kind, - Datum *tuple_datums); + Datum *tuple_datums, int n_tuple_datums); static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, - int offset, void *probe, bool probe_is_bound); + int offset, PartitionBoundCmpArg *arg); static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, - void *probe, bool probe_is_bound, bool *is_equal); + PartitionBoundCmpArg *arg, + bool *is_equal); /* * RelationBuildPartitionDesc @@ -663,10 +688,16 @@ check_new_partition_bound(char *relname, Relation parent, { int offset; bool equal; - + PartitionListValue lbound; + PartitionBoundCmpArg arg; + + memset(&lbound, 0, sizeof(PartitionListValue)); + lbound.value = val->constvalue; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = true; + arg.bound.lbound = &lbound; offset = partition_bound_bsearch(key, boundinfo, - &val->constvalue, - true, &equal); + &arg, &equal); if (offset >= 0 && equal) { overlap = true; @@ -717,6 +748,7 @@ check_new_partition_bound(char *relname, Relation parent, PartitionBoundInfo boundinfo = partdesc->boundinfo; int offset; bool equal; + PartitionBoundCmpArg arg; Assert(boundinfo && boundinfo->ndatums > 0 && boundinfo->strategy == PARTITION_STRATEGY_RANGE); @@ -736,8 +768,11 @@ check_new_partition_bound(char *relname, Relation parent, * since the index array is initialised with an extra -1 * at the end. */ - offset = partition_bound_bsearch(key, boundinfo, lower, - true, &equal); + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = true; + arg.bound.rbound = lower; + offset = partition_bound_bsearch(key, boundinfo, &arg, + &equal); if (boundinfo->indexes[offset + 1] < 0) { @@ -751,9 +786,9 @@ check_new_partition_bound(char *relname, Relation parent, { int32 cmpval; + arg.bound.rbound = upper; cmpval = partition_bound_cmp(key, boundinfo, - offset + 1, upper, - true); + offset + 1, &arg); if (cmpval < 0) { /* @@ -2037,9 +2072,14 @@ get_partition_for_tuple(PartitionDispatch *pd, { /* Else bsearch in partdesc->boundinfo */ bool equal = false; + PartitionBoundCmpArg arg; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = false; + arg.datums = values; + arg.ndatums = key->partnatts; cur_offset = partition_bound_bsearch(key, partdesc->boundinfo, - values, false, &equal); + &arg, &equal); switch (key->strategy) { case PARTITION_STRATEGY_LIST: @@ -2237,12 +2277,12 @@ partition_rbound_cmp(PartitionKey key, static int32 partition_rbound_datum_cmp(PartitionKey key, Datum *rb_datums, PartitionRangeDatumKind *rb_kind, - Datum *tuple_datums) + Datum *tuple_datums, int n_tuple_datums) { int i; int32 cmpval = -1; - for (i = 0; i < key->partnatts; i++) + for (i = 0; i < n_tuple_datums; i++) { if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE) return -1; @@ -2264,11 +2304,11 @@ partition_rbound_datum_cmp(PartitionKey key, * partition_bound_cmp * * Return whether the bound at offset in boundinfo is <, =, or > the argument - * specified in *probe. + * specified in *arg. */ static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, - int offset, void *probe, bool probe_is_bound) + int offset, PartitionBoundCmpArg *arg) { Datum *bound_datums = boundinfo->datums[offset]; int32 cmpval = -1; @@ -2276,17 +2316,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, switch (key->strategy) { case PARTITION_STRATEGY_LIST: - cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], - key->partcollation[0], - bound_datums[0], - *(Datum *) probe)); - break; + { + Datum listdatum; + + if (arg->is_bound) + listdatum = arg->bound.lbound->value; + else + { + if (arg->ndatums >= 1) + listdatum = arg->datums[0]; + /* + * If no tuple datum to compare with the bound, consider + * the latter to be greater. + */ + else + return 1; + } + + cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], + key->partcollation[0], + bound_datums[0], + listdatum)); + break; + } case PARTITION_STRATEGY_RANGE: { PartitionRangeDatumKind *kind = boundinfo->kind[offset]; - if (probe_is_bound) + if (arg->is_bound) { /* * We need to pass whether the existing bound is a lower @@ -2297,12 +2355,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, cmpval = partition_rbound_cmp(key, bound_datums, kind, lower, - (PartitionRangeBound *) probe); + arg->bound.rbound); } else cmpval = partition_rbound_datum_cmp(key, bound_datums, kind, - (Datum *) probe); + arg->datums, + arg->ndatums); break; } @@ -2316,20 +2375,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, /* * Binary search on a collection of partition bounds. Returns greatest - * bound in array boundinfo->datums which is less than or equal to *probe. - * If all bounds in the array are greater than *probe, -1 is returned. + * bound in array boundinfo->datums which is less than or equal to *arg. + * If all bounds in the array are greater than *arg, -1 is returned. * - * *probe could either be a partition bound or a Datum array representing - * the partition key of a tuple being routed; probe_is_bound tells which. - * We pass that down to the comparison function so that it can interpret the - * contents of *probe accordingly. + * *arg could either be a partition bound or a Datum array representing + * the partition key of a tuple being routed. We simply pass that down to + * partition_bound_cmp where it is interpreted appropriately. * * *is_equal is set to whether the bound at the returned index is equal with - * *probe. + * *arg. */ static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, - void *probe, bool probe_is_bound, bool *is_equal) + PartitionBoundCmpArg *arg, bool *is_equal) { int lo, hi, @@ -2342,8 +2400,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, int32 cmpval; mid = (lo + hi + 1) / 2; - cmpval = partition_bound_cmp(key, boundinfo, mid, probe, - probe_is_bound); + cmpval = partition_bound_cmp(key, boundinfo, mid, arg); if (cmpval <= 0) { lo = mid; -- 2.11.0
From 240d2d65fbb4af5959df8e2e1ae576bf46440d4d Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 13:48:13 +0900 Subject: [PATCH 4/5] WIP: Implement get_partitions_for_keys() Disable constraint exclusion that occurs using internal partition constraints, so that it's apparent what the new partition-pruning code still needs to do to able to create a plan matching the plain the the traditional constraint exclusion based partition-pruning would result in. --- src/backend/catalog/partition.c | 193 ++++++++++++++++++++++++++++++++++- src/backend/optimizer/util/plancat.c | 4 + 2 files changed, 192 insertions(+), 5 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 14876f8ea3..617900f62f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1194,8 +1194,6 @@ RelationGetPartitionDispatchInfo(Relation rel, * datum that the query's bounding keys allow to be returned for the query. * Similarly, *max_datum_index. *null_partition_chosen returns whether * the null partition will be scanned. - * - * TODO: Implement. */ List * get_partitions_for_keys(Relation rel, @@ -1205,12 +1203,197 @@ get_partitions_for_keys(Relation rel, int *min_datum_index, int *max_datum_index, bool *null_partition_chosen) { + int i, + minoff, + maxoff; List *result = NIL; - int i; + PartitionKey partkey = RelationGetPartitionKey(rel); PartitionDesc partdesc = RelationGetPartitionDesc(rel); + PartitionBoundCmpArg arg; + bool is_equal; + int null_partition_idx = partdesc->boundinfo->null_index; + + *null_partition_chosen = false; + + /* + * Check if any of the scan keys are null. If so, return the only + * null-accepting partition if partdesc->boundinfo says there is one. + */ + for (i = 0; i < partkey->partnatts; i++) + { + if (keynullness[i] == IS_NULL) + { + if (null_partition_idx >= 0) + { + *null_partition_chosen = true; + result = list_make1_int(null_partition_idx); + } + else + result = NIL; + + return result; + } + } + + if (n_minkeys > 0 && partdesc->nparts > 0) + { + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.datums = minkeys; + arg.ndatums = n_minkeys; + minoff = partition_bound_bsearch(partkey, partdesc->boundinfo, + &arg, &is_equal); + + if (is_equal && arg.ndatums < partkey->partnatts) + { + int32 cmpval; + + is_equal = false; + + do + { + if (min_inclusive) + minoff -= 1; + else + minoff += 1; + if (minoff < 0 || + minoff >= partdesc->boundinfo->ndatums) + break; + cmpval = partition_bound_cmp(partkey, partdesc->boundinfo, + minoff, &arg); + } while (cmpval == 0); + } + + /* Interpret the result per partition strategy. */ + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + /* + * Found, but if the query may have asked us to exclude it. + */ + if (is_equal && !min_inclusive) + minoff++; + break; + + case PARTITION_STRATEGY_RANGE: + /* + * Records returned by the query will be > bounds[minoff], + * because min_scankey is >= bounds[minoff], that is, no + * records of the partition at minoff will be returned. Go + * to the next bound. + */ + if (minoff < partdesc->boundinfo->ndatums - 1) + minoff += 1; + + /* + * Make sure to skip a gap. + * Note: There are ndatums + 1 lots in the indexes array. + */ + if (partdesc->boundinfo->indexes[minoff] < 0 && + partdesc->boundinfo->indexes[minoff + 1] >= 0) + minoff += 1; + break; + } + } + else + minoff = 0; + + if (n_maxkeys > 0 && partdesc->nparts > 0) + { + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.datums = maxkeys; + arg.ndatums = n_maxkeys; + maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo, + &arg, &is_equal); + if (is_equal && arg.ndatums < partkey->partnatts) + { + int32 cmpval; - for (i = 0; i < partdesc->nparts; i++) - result = lappend_int(result, i); + is_equal = false; + + do + { + if (max_inclusive) + maxoff += 1; + else + maxoff -= 1; + if (maxoff < 0 || + maxoff >= partdesc->boundinfo->ndatums) + break; + cmpval = partition_bound_cmp(partkey, partdesc->boundinfo, + maxoff, &arg); + } while (cmpval == 0); + + /* Back up if went too far. */ + if (max_inclusive) + maxoff -= 1; + } + + /* Interpret the result per partition strategy. */ + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + /* + * Found, but if the query may have asked us to exclude it. + */ + if (is_equal && !max_inclusive) + maxoff--; + break; + + case PARTITION_STRATEGY_RANGE: + /* + * Because bounds[maxoff] <= max_scankey, we may need to + * to consider the next partition as well, in addition to + * the partition at maxoff and earlier. + */ + if (!is_equal || max_inclusive) + maxoff += 1; + + /* Make sure to skip a gap. */ + if (partdesc->boundinfo->indexes[maxoff] < 0 && maxoff >= 1) + maxoff -= 1; + break; + } + } + else + maxoff = partdesc->boundinfo->ndatums - 1; + + *min_datum_index = minoff; + *max_datum_index = maxoff; + + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + for (i = minoff; i <= maxoff; i++) + { + int partition_idx = partdesc->boundinfo->indexes[i]; + + /* + * Multiple values may belong to the same partition, so make + * sure we don't add the same partition index again. + */ + result = list_append_unique_int(result, partition_idx); + } + + /* If no bounding keys exist, include the null partition too. */ + if (null_partition_idx >= 0 && + (keynullness[0] == -1 || keynullness[0] != IS_NOT_NULL)) + { + *null_partition_chosen = true; + result = list_append_unique_int(result, null_partition_idx); + } + + break; + + case PARTITION_STRATEGY_RANGE: + for (i = minoff; i <= maxoff; i++) + { + int partition_idx = partdesc->boundinfo->indexes[i]; + + if (partition_idx >= 0) + result = lappend_int(result, partition_idx); + } + break; + } return result; } diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index de50b5d86a..ec51d89caa 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1155,7 +1155,9 @@ get_relation_constraints(PlannerInfo *root, Index varno = rel->relid; Relation relation; TupleConstr *constr; +#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING List *pcqual; +#endif /* * We assume the relation has already been safely locked. @@ -1241,6 +1243,7 @@ get_relation_constraints(PlannerInfo *root, } } +#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING /* Append partition predicates, if any */ pcqual = RelationGetPartitionQual(relation); if (pcqual) @@ -1258,6 +1261,7 @@ get_relation_constraints(PlannerInfo *root, result = list_concat(result, pcqual); } +#endif heap_close(relation, NoLock); -- 2.11.0
From a8808856cb8a6e4553c72b8c0418a69b2bb1aa47 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 5 Sep 2017 10:28:23 +0900 Subject: [PATCH 5/5] Add more tests for the new partitioning-related planning code --- src/test/regress/expected/partition.out | 449 ++++++++++++++++++++++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 + src/test/regress/sql/partition.sql | 80 ++++++ 4 files changed, 531 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/partition.out create mode 100644 src/test/regress/sql/partition.sql diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out new file mode 100644 index 0000000000..ae5f59e3a6 --- /dev/null +++ b/src/test/regress/expected/partition.out @@ -0,0 +1,449 @@ +-- +-- Test partitioning planner code +-- +create table lp (a char) partition by list (a); +create table lp_ef partition of lp for values in ('e', 'f'); +create table lp_ad partition of lp for values in ('a', 'd'); +create table lp_bc partition of lp for values in ('b', 'c'); +create table lp_null partition of lp for values in (null); +explain (costs off) select * from lp; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on lp_ad + -> Seq Scan on lp_bc + -> Seq Scan on lp_ef + -> Seq Scan on lp_null +(5 rows) + +explain (costs off) select * from lp where a > 'a' and a < 'd'; + QUERY PLAN +----------------------------------------------------------- + Append + -> Seq Scan on lp_bc + Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar)) +(3 rows) + +explain (costs off) select * from lp where a > 'a' and a <= 'd'; + QUERY PLAN +------------------------------------------------------------ + Append + -> Seq Scan on lp_bc + Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar)) + -> Seq Scan on lp_ad + Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar)) +(5 rows) + +explain (costs off) select * from lp where a = 'a'; + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on lp_ad + Filter: (a = 'a'::bpchar) +(3 rows) + +explain (costs off) select * from lp where a is not null; + QUERY PLAN +--------------------------------- + Append + -> Seq Scan on lp_ad + Filter: (a IS NOT NULL) + -> Seq Scan on lp_bc + Filter: (a IS NOT NULL) + -> Seq Scan on lp_ef + Filter: (a IS NOT NULL) +(7 rows) + +explain (costs off) select * from lp where a is null; + QUERY PLAN +----------------------------- + Append + -> Seq Scan on lp_null + Filter: (a IS NULL) +(3 rows) + +create table rlp (a int, b varchar) partition by range (a); +create table rlp1 partition of rlp for values from (minvalue) to (1); +create table rlp2 partition of rlp for values from (1) to (10); +create table rlp3 (b varchar, a int) partition by list (b varchar_ops); +create table rlp3abcd partition of rlp3 for values in ('ab', 'cd'); +create table rlp3efgh partition of rlp3 for values in ('ef', 'gh'); +create table rlp3nullxy partition of rlp3 for values in (null, 'xy'); +alter table rlp attach partition rlp3 for values from (15) to (20); +create table rlp4 partition of rlp for values from (20) to (30); +create table rlp5 partition of rlp for values from (31) to (maxvalue); +explain (costs off) select * from rlp where a < 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a < 1) +(3 rows) + +explain (costs off) select * from rlp where a <= 1; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a <= 1) + -> Seq Scan on rlp2 + Filter: (a <= 1) +(5 rows) + +explain (costs off) select * from rlp where a = 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp2 + Filter: (a = 1) +(3 rows) + +explain (costs off) select * from rlp where a = 1::bigint; /* same as above */ + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on rlp2 + Filter: (a = '1'::bigint) +(3 rows) + +explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */ + QUERY PLAN +----------------------------------------------- + Append + -> Seq Scan on rlp1 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp2 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3abcd + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3efgh + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3nullxy + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp4 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp5 + Filter: ((a)::numeric = '1'::numeric) +(15 rows) + +explain (costs off) select * from rlp where a <= 10; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a <= 10) + -> Seq Scan on rlp2 + Filter: (a <= 10) +(5 rows) + +explain (costs off) select * from rlp where a > 10; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: (a > 10) + -> Seq Scan on rlp3efgh + Filter: (a > 10) + -> Seq Scan on rlp3nullxy + Filter: (a > 10) + -> Seq Scan on rlp4 + Filter: (a > 10) + -> Seq Scan on rlp5 + Filter: (a > 10) +(11 rows) + +explain (costs off) select * from rlp where a < 15; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a < 15) + -> Seq Scan on rlp2 + Filter: (a < 15) +(5 rows) + +explain (costs off) select * from rlp where a <= 15; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp1 + Filter: (a <= 15) + -> Seq Scan on rlp2 + Filter: (a <= 15) + -> Seq Scan on rlp3abcd + Filter: (a <= 15) + -> Seq Scan on rlp3efgh + Filter: (a <= 15) + -> Seq Scan on rlp3nullxy + Filter: (a <= 15) +(11 rows) + +explain (costs off) select * from rlp where a > 15 and b = 'ab'; + QUERY PLAN +--------------------------------------------------------- + Append + -> Seq Scan on rlp3abcd + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) + -> Seq Scan on rlp4 + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) + -> Seq Scan on rlp5 + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) +(7 rows) + +explain (costs off) select * from rlp where a = 16 and b is null; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on rlp3nullxy + Filter: ((b IS NULL) AND (a = 16)) +(3 rows) + +explain (costs off) select * from rlp where a = 16 and b is not null; + QUERY PLAN +------------------------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: ((b IS NOT NULL) AND (a = 16)) + -> Seq Scan on rlp3efgh + Filter: ((b IS NOT NULL) AND (a = 16)) + -> Seq Scan on rlp3nullxy + Filter: ((b IS NOT NULL) AND (a = 16)) +(7 rows) + +explain (costs off) select * from rlp where a is null; /* while we're on nulls */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from rlp where a > 30; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp5 + Filter: (a > 30) +(3 rows) + +explain (costs off) select * from rlp where a = 30; /* empty */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from rlp where a <= 31; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp1 + Filter: (a <= 31) + -> Seq Scan on rlp2 + Filter: (a <= 31) + -> Seq Scan on rlp3abcd + Filter: (a <= 31) + -> Seq Scan on rlp3efgh + Filter: (a <= 31) + -> Seq Scan on rlp3nullxy + Filter: (a <= 31) + -> Seq Scan on rlp4 + Filter: (a <= 31) + -> Seq Scan on rlp5 + Filter: (a <= 31) +(15 rows) + +-- multi-column keys +create table mc3p (a int, b int, c int) partition by range (a, abs(b), c); +create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1); +create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10); +create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10); +create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20); +create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue); +create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10); +create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20); +create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0); +explain (costs off) select * from mc3p where a = 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on mc3p0 + Filter: (a = 1) + -> Seq Scan on mc3p1 + Filter: (a = 1) +(5 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) < 1; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a = 1) AND (abs(b) < 1)) +(3 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a = 1) AND (abs(b) = 1)) + -> Seq Scan on mc3p1 + Filter: ((a = 1) AND (abs(b) = 1)) +(5 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8; + QUERY PLAN +-------------------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1)) + -> Seq Scan on mc3p1 + Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1)) +(5 rows) + +explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35; + QUERY PLAN +----------------------------------------------------------------- + Append + -> Seq Scan on mc3p1 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p2 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p3 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p4 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) +(9 rows) + +explain (costs off) select * from mc3p where a > 10; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p5 + Filter: (a > 10) + -> Seq Scan on mc3p6 + Filter: (a > 10) + -> Seq Scan on mc3p7 + Filter: (a > 10) +(7 rows) + +explain (costs off) select * from mc3p where a >= 10; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on mc3p1 + Filter: (a >= 10) + -> Seq Scan on mc3p2 + Filter: (a >= 10) + -> Seq Scan on mc3p3 + Filter: (a >= 10) + -> Seq Scan on mc3p4 + Filter: (a >= 10) + -> Seq Scan on mc3p5 + Filter: (a >= 10) + -> Seq Scan on mc3p6 + Filter: (a >= 10) + -> Seq Scan on mc3p7 + Filter: (a >= 10) +(15 rows) + +explain (costs off) select * from mc3p where a < 10; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p0 + Filter: (a < 10) + -> Seq Scan on mc3p1 + Filter: (a < 10) +(5 rows) + +explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10; + QUERY PLAN +----------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a <= 10) AND (abs(b) < 10)) + -> Seq Scan on mc3p1 + Filter: ((a <= 10) AND (abs(b) < 10)) + -> Seq Scan on mc3p2 + Filter: ((a <= 10) AND (abs(b) < 10)) +(7 rows) + +explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100; + QUERY PLAN +------------------------------------------------------------ + Append + -> Seq Scan on mc3p6 + Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10)) +(3 rows) + +explain (costs off) select * from mc3p where a > 20; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p7 + Filter: (a > 20) +(3 rows) + +explain (costs off) select * from mc3p where a >= 20; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on mc3p5 + Filter: (a >= 20) + -> Seq Scan on mc3p6 + Filter: (a >= 20) + -> Seq Scan on mc3p7 + Filter: (a >= 20) +(7 rows) + +-- XXX - '<>' clauses cannot be handled yet +explain (costs off) select * from lp where a <> 'a'; + QUERY PLAN +------------------------------------ + Append + -> Seq Scan on lp_ad + Filter: (a <> 'a'::bpchar) + -> Seq Scan on lp_bc + Filter: (a <> 'a'::bpchar) + -> Seq Scan on lp_ef + Filter: (a <> 'a'::bpchar) +(7 rows) + +-- XXX - redundant clause elimination does not happen yet +explain (costs off) select * from mc3p where a = 10 and a > 1; + QUERY PLAN +---------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p1 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p2 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p3 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p4 + Filter: ((a > 1) AND (a = 10)) +(11 rows) + +-- XXX - the OR clauses don't contribute to partition-pruning yet +explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef'; + QUERY PLAN +------------------------------------------------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) + -> Seq Scan on rlp3efgh + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) + -> Seq Scan on rlp3nullxy + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) +(7 rows) + +drop table lp, rlp, mc3p; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 2fd3f2b1b1..2eb81fcf41 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -60,7 +60,7 @@ test: create_index create_view # ---------- # Another group of parallel tests # ---------- -test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func +test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition # ---------- # sanity_check does a vacuum, affecting the sort order of SELECT * diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule index 76b0de30a7..6611662149 100644 --- a/src/test/regress/serial_schedule +++ b/src/test/regress/serial_schedule @@ -71,6 +71,7 @@ test: create_cast test: constraints test: triggers test: inherit +test: partition test: create_table_like test: typed_table test: vacuum diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql new file mode 100644 index 0000000000..423ac3726f --- /dev/null +++ b/src/test/regress/sql/partition.sql @@ -0,0 +1,80 @@ +-- +-- Test partitioning planner code +-- +create table lp (a char) partition by list (a); +create table lp_ef partition of lp for values in ('e', 'f'); +create table lp_ad partition of lp for values in ('a', 'd'); +create table lp_bc partition of lp for values in ('b', 'c'); +create table lp_null partition of lp for values in (null); +explain (costs off) select * from lp; +explain (costs off) select * from lp where a > 'a' and a < 'd'; +explain (costs off) select * from lp where a > 'a' and a <= 'd'; +explain (costs off) select * from lp where a = 'a'; +explain (costs off) select * from lp where a is not null; +explain (costs off) select * from lp where a is null; + +create table rlp (a int, b varchar) partition by range (a); +create table rlp1 partition of rlp for values from (minvalue) to (1); +create table rlp2 partition of rlp for values from (1) to (10); + +create table rlp3 (b varchar, a int) partition by list (b varchar_ops); +create table rlp3abcd partition of rlp3 for values in ('ab', 'cd'); +create table rlp3efgh partition of rlp3 for values in ('ef', 'gh'); +create table rlp3nullxy partition of rlp3 for values in (null, 'xy'); +alter table rlp attach partition rlp3 for values from (15) to (20); + +create table rlp4 partition of rlp for values from (20) to (30); +create table rlp5 partition of rlp for values from (31) to (maxvalue); + +explain (costs off) select * from rlp where a < 1; +explain (costs off) select * from rlp where a <= 1; +explain (costs off) select * from rlp where a = 1; +explain (costs off) select * from rlp where a = 1::bigint; /* same as above */ +explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */ +explain (costs off) select * from rlp where a <= 10; +explain (costs off) select * from rlp where a > 10; +explain (costs off) select * from rlp where a < 15; +explain (costs off) select * from rlp where a <= 15; +explain (costs off) select * from rlp where a > 15 and b = 'ab'; +explain (costs off) select * from rlp where a = 16 and b is null; +explain (costs off) select * from rlp where a = 16 and b is not null; +explain (costs off) select * from rlp where a is null; /* while we're on nulls */ +explain (costs off) select * from rlp where a > 30; +explain (costs off) select * from rlp where a = 30; /* empty */ +explain (costs off) select * from rlp where a <= 31; + +-- multi-column keys +create table mc3p (a int, b int, c int) partition by range (a, abs(b), c); +create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1); +create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10); +create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10); +create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20); +create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue); +create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10); +create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20); +create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0); + +explain (costs off) select * from mc3p where a = 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) < 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8; +explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35; +explain (costs off) select * from mc3p where a > 10; +explain (costs off) select * from mc3p where a >= 10; +explain (costs off) select * from mc3p where a < 10; +explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10; +explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */ +explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100; +explain (costs off) select * from mc3p where a > 20; +explain (costs off) select * from mc3p where a >= 20; + +-- XXX - '<>' clauses cannot be handled yet +explain (costs off) select * from lp where a <> 'a'; + +-- XXX - redundant clause elimination does not happen yet +explain (costs off) select * from mc3p where a = 10 and a > 1; + +-- XXX - the OR clauses don't contribute to partition-pruning yet +explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef'; + +drop table lp, rlp, mc3p; -- 2.11.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers