Hi David, On 2017/09/27 6:04, David Rowley wrote: > On 25 September 2017 at 23:04, Amit Langote > <langote_amit...@lab.ntt.co.jp> wrote: >> By the way, I'm now rebasing these patches on top of [1] and will try to >> merge your refactoring patch in some appropriate way. Will post more >> tomorrow. >> >> [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826 > > Yeah, I see 0001 conflicts with that. I'm going to set this to waiting > on author while you're busy rebasing this.
Thanks for the reminder. Just thought I'd say that while I'm actually done rebasing itself (attaching rebased patches to try 'em out), I'm now considering Robert's comments and will be busy for a bit revising things based on those comments. Some notes about the attached patches: - 0001 includes refactoring that Dilip proposed upthread [1] (added him as an author). I slightly tweaked his patch -- renamed the function get_matching_clause to match_clauses_to_partkey, similar to match_clauses_to_index. - Code to set AppendPath's partitioned_rels in add_paths_to_append_rel() revised by 0a480502b09 (was originally introduced in d3cc37f1d80) is still revised to get partitioned_rels from a source that is not PlannerInfo.pcinfo_list. With the new code, partitioned_rels won't contain RT indexes of the partitioned child tables that were pruned. Thanks, Amit [1] https://www.postgresql.org/message-id/CAFiTN-tGnQzF_4QtbOHT-3hE%3DOvNaMfbbeRxa4UY0CQyF0G8gQ%40mail.gmail.com
From f20aebcad9b089434ba60cd4439fa1a9d55091b8 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 13 Sep 2017 18:24:55 +0900 Subject: [PATCH 1/4] WIP: planner-side changes for partition-pruning Firstly, this adds a stub get_partitions_for_keys() in partition.c with appropriate interface for the caller to specify bounding scan keys, along with other information about the scan keys extracted from the query, such as NULL-ness of the keys, inclusive-ness, etc. More importantly, this implements the planner-side logic to extract bounding scan keys to be passed to get_partitions_for_keys. That is, it will go through rel->baserestrictinfo and match individual clauses to partition keys and construct lower bound and upper bound tuples, which may cover only a prefix of a multi-column partition key. A bunch of smarts are still missing when mapping the clause operands with keys. For example, code to match a clause is specifed as (constant op var) doesn't exist. Also, redundant keys are not eliminated, for example, a combination of clauses a = 10 and a > 1 will cause the later clause a > 1 taking over and resulting in needless scanning of partitions containing values a > 1 and a < 10. ...constraint exclusion is still used, because get_partitions_for_keys is just a stub... Authors: Amit Langote, Dilip Kumar --- src/backend/catalog/partition.c | 43 ++++ src/backend/optimizer/path/allpaths.c | 390 ++++++++++++++++++++++++++++++---- src/backend/optimizer/util/plancat.c | 10 + src/backend/optimizer/util/relnode.c | 7 + src/include/catalog/partition.h | 9 + src/include/nodes/nodes.h | 1 + src/include/nodes/relation.h | 66 ++++++ 7 files changed, 487 insertions(+), 39 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 1ab6dba7ae..ccf8a1fa67 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1335,6 +1335,49 @@ get_partition_dispatch_recurse(Relation rel, Relation parent, } } +/* + * get_partitions_for_keys + * Returns the list of indexes of rel's partitions that will need to be + * scanned given the bounding scan keys. + * + * Each value in the returned list can be used as an index into the oids array + * of the partition descriptor. + * + * Inputs: + * keynullness contains between 0 and (key->partnatts - 1) values, each + * telling what kind of NullTest has been applies to the corresponding + * partition key column. minkeys represents the lower bound on the partition + * the key of the records that the query will return, while maxkeys + * represents upper bound. min_inclusive and max_inclusive tell whether the + * bounds specified minkeys and maxkeys is inclusive, respectively. + * + * Other outputs: + * *min_datum_index will return the index in boundinfo->datums of the first + * datum that the query's bounding keys allow to be returned for the query. + * Similarly, *max_datum_index. *null_partition_chosen returns whether + * the null partition will be scanned. + * + * TODO: Implement. + */ +List * +get_partitions_for_keys(Relation rel, + NullTestType *keynullness, + Datum *minkeys, int n_minkeys, bool min_inclusive, + Datum *maxkeys, int n_maxkeys, bool max_inclusive, + int *min_datum_index, int *max_datum_index, + bool *null_partition_chosen, + bool *default_partition_chosen) +{ + List *result = NIL; + int i; + PartitionDesc partdesc = RelationGetPartitionDesc(rel); + + for (i = 0; i < partdesc->nparts; i++) + result = lappend_int(result, i); + + return result; +} + /* Module-local functions */ /* diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index a7866a99e0..47f88d1a8f 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -20,6 +20,7 @@ #include "access/sysattr.h" #include "access/tsmapi.h" +#include "catalog/partition.h" #include "catalog/pg_class.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -135,6 +136,10 @@ static void recurse_push_qual(Node *setOp, Query *topquery, static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel); static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, List *live_childrels); +static void match_clauses_to_partkey(RelOptInfo *rel, + List *clauses, + List **matchedclauses, + NullTestType *keynullness); /* @@ -846,6 +851,279 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) } /* + * match_clauses_to_partkey + * Match the clauses in the list with partition's partition key + * + * Matched clauses are returned matchedclauses, which is an array with + * partnatts members, where each member is a list of clauses matched to the + * respective partition key. keynullness array also contains partnatts + * members where each member corresponds to the type of the NullTest + * encountered for a given partition key. + */ +void +match_clauses_to_partkey(RelOptInfo *rel, + List *clauses, + List **matchedclauses, + NullTestType *keynullness) +{ + ListCell *lc; + int keyPos; + int i; + + /* + * Match individual OpExprs in the query's restriction with individual + * partition key columns. There is one list per key. + */ + memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType)); + memset(matchedclauses, 0, PARTITION_MAX_KEYS * sizeof(List*)); + keyPos = 0; + for (i = 0; i < rel->part_scheme->partnatts; i++) + { + Node *partkey = linitial(rel->partexprs[i]); + + foreach(lc, clauses) + { + RestrictInfo *rinfo = lfirst(lc); + Expr *clause = rinfo->clause; + + if (is_opclause(clause)) + { + Node *leftop = get_leftop(clause), + *rightop = get_rightop(clause); + Oid expr_op = ((OpExpr *) clause)->opno; + + if (IsA(leftop, RelabelType)) + leftop = (Node *) ((RelabelType *) leftop)->arg; + if (IsA(rightop, RelabelType)) + rightop = (Node *) ((RelabelType *) rightop)->arg; + + if (equal(leftop, partkey)) + { + matchedclauses[keyPos] = lappend(matchedclauses[keyPos], + clause); + /* A strict operator implies NOT NULL argument. */ + keynullness[keyPos] = IS_NOT_NULL; + } + else if (equal(rightop, partkey)) + { + Oid commutator = get_commutator(expr_op); + + if (OidIsValid(commutator)) + { + OpExpr *commutated_expr; + + /* + * Generate a commutated copy of the expression, but + * try to make it look valid, because we only want + * it to put the constant operand in a place that the + * following code knows as the only place to find it. + */ + commutated_expr = (OpExpr *) copyObject(clause); + commutated_expr->opno = commutator; /* really? */ + commutated_expr->args = list_make2(rightop, leftop); + matchedclauses[keyPos] = + lappend(matchedclauses[keyPos], + commutated_expr); + /* A strict operator implies NOT NULL argument. */ + keynullness[keyPos] = IS_NOT_NULL; + } + } + } + else if (IsA(clause, NullTest)) + { + NullTest *nulltest = (NullTest *) clause; + Node *arg = (Node *) nulltest->arg; + + if (equal(arg, partkey)) + keynullness[keyPos] = nulltest->nulltesttype; + } + } + + /* Onto finding clauses matching the next partition key. */ + keyPos++; + } +} + +/* + * get_rel_partitions + * Return the list of partitions of rel that pass the query clauses + * + * Returned list contains the AppendInfos of the chosen partitions. + */ +static List * +get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) +{ + Relation parent = heap_open(rte->relid, NoLock); + List *indexes; + List *result = NIL; + ListCell *lc1; + List *matchedclauses[PARTITION_MAX_KEYS]; + NullTestType keynullness[PARTITION_MAX_KEYS]; + Datum minkeys[PARTITION_MAX_KEYS], + maxkeys[PARTITION_MAX_KEYS]; + bool need_next_min, + need_next_max, + minkey_set[PARTITION_MAX_KEYS], + maxkey_set[PARTITION_MAX_KEYS], + min_incl, + max_incl; + int n_minkeys = 0, + n_maxkeys = 0, + i; + + /* + * Get the clauses that match the partition key, including information + * about any nullness tests against partition keys. + */ + match_clauses_to_partkey(rel, + rel->baserestrictinfo, + matchedclauses, + keynullness); + + /* + * Determine the min keys and the max keys using btree semantics-based + * interpretation of the clauses' operators. + */ + + /* + * XXX - There should be a step similar to _bt_preprocess_keys() here, + * to eliminate any redundant scan keys for a given partition column. For + * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being + * more restrictive and discard a <= 5. While doing that, we can also + * check to see if there exists a contradictory combination of scan keys + * that makes the query trivially false for all records in the table. + */ + memset(minkeys, 0, sizeof(minkeys)); + memset(maxkeys, 0, sizeof(maxkeys)); + memset(minkey_set, false, sizeof(minkey_set)); + memset(maxkey_set, false, sizeof(maxkey_set)); + need_next_min = true; + need_next_max = true; + for (i = 0; i < rel->part_scheme->partnatts; i++) + { + /* + * If no scan key existed for the previous column, we are done. + */ + if (i > n_minkeys) + need_next_min = false; + + if (i > n_maxkeys) + need_next_max = false; + + foreach(lc1, matchedclauses[i]) + { + Expr *clause = lfirst(lc1); + Const *rightop = (Const *) get_rightop(clause); + Oid opno = ((OpExpr *) clause)->opno, + opfamily = rel->part_scheme->partopfamily[i]; + StrategyNumber strategy; + + strategy = get_op_opfamily_strategy(opno, opfamily); + switch (strategy) + { + case BTLessStrategyNumber: + case BTLessEqualStrategyNumber: + if (need_next_max) + { + maxkeys[i] = rightop->constvalue; + if (!maxkey_set[i]) + n_maxkeys++; + maxkey_set[i] = true; + max_incl = (strategy == BTLessEqualStrategyNumber); + } + if (strategy == BTLessStrategyNumber) + need_next_max = false; + break; + + case BTGreaterStrategyNumber: + case BTGreaterEqualStrategyNumber: + if (need_next_min) + { + minkeys[i] = rightop->constvalue; + if (!minkey_set[i]) + n_minkeys++; + minkey_set[i] = true; + min_incl = (strategy == BTGreaterEqualStrategyNumber); + } + if (strategy == BTGreaterStrategyNumber) + need_next_min = false; + break; + + case BTEqualStrategyNumber: + if (need_next_min) + { + minkeys[i] = rightop->constvalue; + if (!minkey_set[i]) + n_minkeys++; + } + minkey_set[i] = true; + min_incl = true; + + if (need_next_max) + { + maxkeys[i] = rightop->constvalue; + if (!maxkey_set[i]) + n_maxkeys++; + } + maxkey_set[i] = true; + max_incl = true; + break; + + /* + * This might mean '<>', but we don't have anything for that + * case yet. Perhaps, handle that as key < const OR + * key > const, once we have props needed for handling OR + * clauses. + */ + default: + min_incl = max_incl = false; + break; + } + } + } + + /* Ask partition.c which partitions it thinks match the keys. */ + indexes = get_partitions_for_keys(parent, keynullness, + minkeys, n_minkeys, min_incl, + maxkeys, n_maxkeys, max_incl, + &rel->painfo->min_datum_idx, + &rel->painfo->max_datum_idx, + &rel->painfo->contains_null_partition, + &rel->painfo->contains_default_partition); + + if (indexes != NIL) + { +#ifdef USE_ASSERT_CHECKING + int first_index, + last_index; + first_index = linitial_int(indexes); + last_index = llast_int(indexes); + Assert(first_index <= last_index || + rel->part_scheme->strategy != PARTITION_STRATEGY_RANGE); +#endif + + foreach(lc1, indexes) + { + int partidx = lfirst_int(lc1); + AppendRelInfo *appinfo = rel->part_appinfos[partidx]; +#ifdef USE_ASSERT_CHECKING + RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root); + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + Assert(partdesc->oids[partidx] == rte->relid); +#endif + result = lappend(result, appinfo); + } + } + + /* Remember for future users such as set_append_rel_pathlist(). */ + rel->painfo->live_partition_appinfos = result; + + heap_close(parent, NoLock); + + return result; +} + +/* * set_append_rel_size * Set size estimates for a simple "append relation" * @@ -860,6 +1138,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte) { + List *rel_appinfos = NIL; int parentRTindex = rti; bool has_live_children; double parent_rows; @@ -873,6 +1152,24 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Assert(IS_SIMPLE_REL(rel)); + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach (l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + { + rel_appinfos = get_rel_partitions(root, rel, rte); + Assert(rel->painfo != NULL); + rel->painfo->live_partitioned_rels = list_make1_int(rti); + } + /* * Initialize to compute size estimates for whole append relation. * @@ -893,7 +1190,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, nattrs = rel->max_attr - rel->min_attr + 1; parent_attrsizes = (double *) palloc0(nattrs * sizeof(double)); - foreach(l, root->append_rel_list) + foreach(l, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); int childRTindex; @@ -906,10 +1203,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, ListCell *childvars; ListCell *lc; - /* append_rel_list contains all append rels; ignore others */ - if (appinfo->parent_relid != parentRTindex) - continue; - childRTindex = appinfo->child_relid; childRTE = root->simple_rte_array[childRTindex]; @@ -1118,6 +1411,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, has_live_children = true; /* + * If childrel is itself partitioned, add it and its partitioned + * children to the list being propagated up to the root rel. + */ + if (childrel->painfo && rel->painfo) + { + rel->painfo->live_partitioned_rels = + list_concat(rel->painfo->live_partitioned_rels, + list_copy(childrel->painfo->live_partitioned_rels)); + } + + /* * If any live child is not parallel-safe, treat the whole appendrel * as not parallel-safe. In future we might be able to generate plans * in which some children are farmed out to workers while others are @@ -1213,14 +1517,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte) { int parentRTindex = rti; - List *live_childrels = NIL; + List *rel_appinfos = NIL, + *live_childrels = NIL; ListCell *l; + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach (l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + rel_appinfos = rel->painfo->live_partition_appinfos; + /* * Generate access paths for each member relation, and remember the * non-dummy children. */ - foreach(l, root->append_rel_list) + foreach(l, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); int childRTindex; @@ -1291,33 +1610,31 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, ListCell *l; List *partitioned_rels = NIL; RangeTblEntry *rte; - bool build_partitioned_rels = false; /* - * A root partition will already have a PartitionedChildRelInfo, and a - * non-root partitioned table doesn't need one, because its Append paths - * will get flattened into the parent anyway. For a subquery RTE, no - * PartitionedChildRelInfo exists; we collect all partitioned_rels - * associated with any child. (This assumes that we don't need to look - * through multiple levels of subquery RTEs; if we ever do, we could - * create a PartitionedChildRelInfo with the accumulated list of - * partitioned_rels which would then be found when populated our parent - * rel with paths. For the present, that appears to be unnecessary.) + * AppendPath we are about to generate must record the RT indexes of + * partitioned tables that are direct or indirect children of this Append + * rel. For partitioned tables, we collect its live partitioned children + * from rel->painfo. However, it will contain only its immediate children, + * so collect live partitioned children from all children that are + * themselves partitioned and concatenate to our list before finally + * passing the list to create_append_path() and/or + * generate_mergeappend_paths(). + * + * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the + * list of live partitioned children, so we must assemble the same in the + * loop below from the children that are known to correspond to + * partitioned rels. (This assumes that we don't need to look through + * multiple levels of subquery RTEs; if we ever do, we could consider + * stuffing the list we generate here into sub-query RTE's RelOptInfo, just + * like we do for partitioned rels, which would be used when populating our + * parent rel with paths. For the present, that appears to be + * unnecessary.) */ rte = planner_rt_fetch(rel->relid, root); - switch (rte->rtekind) - { - case RTE_RELATION: - if (rte->relkind == RELKIND_PARTITIONED_TABLE) - partitioned_rels = - get_partitioned_child_rels(root, rel->relid); - break; - case RTE_SUBQUERY: - build_partitioned_rels = true; - break; - default: - elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind); - } + if (rte->rtekind == RTE_RELATION && + rte->relkind == RELKIND_PARTITIONED_TABLE) + partitioned_rels = rel->painfo->live_partitioned_rels; /* * For every non-dummy child, remember the cheapest path. Also, identify @@ -1330,17 +1647,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, ListCell *lcp; /* - * If we need to build partitioned_rels, accumulate the partitioned - * rels for this child. + * Accumulate the live partitioned children of this child, if it's + * itself partitioned rel. */ - if (build_partitioned_rels) - { - List *cprels; - - cprels = get_partitioned_child_rels(root, childrel->relid); + if (childrel->painfo) partitioned_rels = list_concat(partitioned_rels, - list_copy(cprels)); - } + childrel->painfo->live_partitioned_rels); /* * If child has an unparameterized cheapest-total path, add that to diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index cac46bedf9..49578c0684 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1833,6 +1833,16 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, rel->boundinfo = partdesc->boundinfo; rel->nparts = partdesc->nparts; rel->partexprs = build_baserel_partition_key_exprs(relation, rel->relid); + + /* + * A PartitionAppendInfo to map this table to its immediate partitions + * that will be scanned by this query. At the same time, it records the + * table's partitioning properties reflecting any partition-pruning that + * might've occurred to satisfy the query. Rest of the fields are set in + * get_rel_partitions() and set_append_rel_size(). + */ + rel->painfo = makeNode(PartitionAppendInfo); + rel->painfo->boundinfo = partdesc->boundinfo; } /* diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 077e89ae43..24c8442eae 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -17,6 +17,7 @@ #include <limits.h> #include "miscadmin.h" +#include "catalog/pg_class.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" @@ -151,6 +152,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) rel->boundinfo = NULL; rel->part_rels = NULL; rel->partexprs = NULL; + rel->painfo = NULL; /* * Pass top parent's relids down the inheritance hierarchy. If the parent @@ -227,8 +229,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) int cnt_parts = 0; if (nparts > 0) + { + rel->part_appinfos = (AppendRelInfo **) + palloc(sizeof(AppendRelInfo *) * nparts); rel->part_rels = (RelOptInfo **) palloc(sizeof(RelOptInfo *) * nparts); + } foreach(l, root->append_rel_list) { @@ -252,6 +258,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) * also match the PartitionDesc. See expand_partitioned_rtentry. */ Assert(cnt_parts < nparts); + rel->part_appinfos[cnt_parts] = appinfo; rel->part_rels[cnt_parts] = childrel; cnt_parts++; } diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 454a940a23..4f27d3018e 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -99,6 +99,7 @@ extern int get_partition_for_tuple(PartitionDispatch *pd, EState *estate, PartitionDispatchData **failed_at, TupleTableSlot **failed_slot); + extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc); extern Oid get_default_partition_oid(Oid parentId); extern void update_default_partition_oid(Oid parentId, Oid defaultPartId); @@ -106,4 +107,12 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel, PartitionBoundSpec *new_spec); extern List *get_proposed_default_constraint(List *new_part_constaints); +/* Planner support stuff. */ +extern List *get_partitions_for_keys(Relation rel, + NullTestType *keynullness, + Datum *minkeys, int n_minkeys, bool min_inclusive, + Datum *maxkeys, int n_maxkeys, bool max_inclusive, + int *min_datum_index, int *max_datum_index, + bool *null_partition_chosen, + bool *default_partition_chosen); #endif /* PARTITION_H */ diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 27bd4f3363..63196a1211 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -261,6 +261,7 @@ typedef enum NodeTag T_SpecialJoinInfo, T_AppendRelInfo, T_PartitionedChildRelInfo, + T_PartitionAppendInfo, T_PlaceHolderInfo, T_MinMaxAggInfo, T_PlannerParamItem, diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 48e6012f7f..6dfa28bd1b 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -524,6 +524,7 @@ typedef struct PartitionSchemeData *PartitionScheme; * part_scheme - Partitioning scheme of the relation * boundinfo - Partition bounds * nparts - Number of partitions + * part_appinfos - AppendRelInfo of each partition * part_rels - RelOptInfos for each partition * partexprs - Partition key expressions * @@ -561,6 +562,9 @@ typedef enum RelOptKind /* Is the given relation an "other" relation? */ #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL) +typedef struct AppendRelInfo AppendRelInfo; +typedef struct PartitionAppendInfo PartitionAppendInfo; + typedef struct RelOptInfo { NodeTag type; @@ -643,9 +647,19 @@ typedef struct RelOptInfo PartitionScheme part_scheme; /* Partitioning scheme. */ int nparts; /* number of partitions */ struct PartitionBoundInfoData *boundinfo; /* Partition bounds */ + struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of + * of partitioned, stored in the + * same order as of bounds */ struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions, * stored in the same order of bounds */ List **partexprs; /* Partition key expressions. */ + + /* + * For a partitioned relation, the following represents the identities + * of its live partitions (their appinfos) and some informations about + * the bounds that the live partitions satisfy. + */ + PartitionAppendInfo *painfo; } RelOptInfo; /* @@ -2085,6 +2099,58 @@ typedef struct PartitionedChildRelInfo List *child_rels; } PartitionedChildRelInfo; +/* Forward declarations, to avoid including other headers */ +typedef struct PartitionDispatchData *PartitionDispatch; +typedef struct PartitionBoundInfoData *PartitionBoundInfo; +typedef struct PartitionKeyData *PartitionKey; + +/* + * PartitionAppendInfo - Properties of partitions contained in the Append path + * of a given partitioned table + */ +typedef struct PartitionAppendInfo +{ + NodeTag type; + + /* + * List of AppendRelInfos of the table's partitions that satisfy a given + * query. + */ + List *live_partition_appinfos; + + /* + * RT indexes of live partitions that are partitioned tables themselves. + * This includes the RT index of the table itself. + */ + List *live_partitioned_rels; + + /* + * The following simply copies the pointer to boundinfo in the table's + * PartitionDesc. + */ + PartitionBoundInfo boundinfo; + + /* + * Indexes in the boundinfo->datums array of the smallest and the largest + * value of the partition key that the query allows. They are set by + * calling get_partitions_for_keys(). + */ + int min_datum_idx; + int max_datum_idx; + + /* + * Does this Append contain the null-accepting partition, if one exists + * and is allowed by the query's quals. + */ + bool contains_null_partition; + + /* + * Does this Append contain the default partition, if one exists and is + * allowed by the query's quals. + */ + bool contains_default_partition; +} PartitionAppendInfo; + /* * For each distinct placeholder expression generated during planning, we * store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list. -- 2.11.0
From 2830316f6d0f2e74a3e7960290df3a75aad10720 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 11:33:27 +0900 Subject: [PATCH 2/4] WIP: Interface changes for partition_bound_{cmp/bsearch} Introduces a notion of PartitionBoundCmpArg, which replaces the set of arguments void *probe and bool probe_is_bound of both partition_bound_cmp and partition_bound_bsearch. It wasn't possible before to specify the number of datums when a non-bound type of probe is passed. Slightly tweaking the existing interface to allow specifying the same seems awkward. So, instead encapsulate that into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp to compare caller-specifed number of datums, instead of key->partnatts datums. --- src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------ 1 file changed, 96 insertions(+), 39 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index ccf8a1fa67..0133748234 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -111,6 +111,30 @@ typedef struct PartitionRangeBound bool lower; /* this is the lower (vs upper) bound */ } PartitionRangeBound; +/* + * PartitionBoundCmpArg - Caller-defined argument to be passed to + * partition_bound_cmp() + * + * The first (fixed) argument involved in a comparison is the user-defined + * partition bound of a given existing partition, while an instance of the + * following struct describes either a new partition bound being compared + * against existing bounds (is_bound is true in that case and either lbound + * or rbound is set), or a new tuple's partition key specified in datums + * (ndatums = number of partition key columns). + */ +typedef struct PartitionBoundCmpArg +{ + bool is_bound; + union + { + PartitionListValue *lbound; + PartitionRangeBound *rbound; + } bound; + + Datum *datums; + int ndatums; +} PartitionBoundCmpArg; + static int32 qsort_partition_list_value_cmp(const void *a, const void *b, void *arg); static int32 qsort_partition_rbound_cmp(const void *a, const void *b, @@ -139,14 +163,15 @@ static int32 partition_rbound_cmp(PartitionKey key, bool lower1, PartitionRangeBound *b2); static int32 partition_rbound_datum_cmp(PartitionKey key, Datum *rb_datums, PartitionRangeDatumKind *rb_kind, - Datum *tuple_datums); + Datum *tuple_datums, int n_tuple_datums); static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, - int offset, void *probe, bool probe_is_bound); + int offset, PartitionBoundCmpArg *arg); static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, - void *probe, bool probe_is_bound, bool *is_equal); + PartitionBoundCmpArg *arg, + bool *is_equal); static void get_partition_dispatch_recurse(Relation rel, Relation parent, List **pds, List **leaf_part_oids); @@ -755,10 +780,16 @@ check_new_partition_bound(char *relname, Relation parent, { int offset; bool equal; - + PartitionListValue lbound; + PartitionBoundCmpArg arg; + + memset(&lbound, 0, sizeof(PartitionListValue)); + lbound.value = val->constvalue; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = true; + arg.bound.lbound = &lbound; offset = partition_bound_bsearch(key, boundinfo, - &val->constvalue, - true, &equal); + &arg, &equal); if (offset >= 0 && equal) { overlap = true; @@ -809,6 +840,7 @@ check_new_partition_bound(char *relname, Relation parent, PartitionBoundInfo boundinfo = partdesc->boundinfo; int offset; bool equal; + PartitionBoundCmpArg arg; Assert(boundinfo && boundinfo->strategy == PARTITION_STRATEGY_RANGE && @@ -830,8 +862,11 @@ check_new_partition_bound(char *relname, Relation parent, * since the index array is initialised with an extra -1 * at the end. */ - offset = partition_bound_bsearch(key, boundinfo, lower, - true, &equal); + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = true; + arg.bound.rbound = lower; + offset = partition_bound_bsearch(key, boundinfo, &arg, + &equal); if (boundinfo->indexes[offset + 1] < 0) { @@ -845,9 +880,9 @@ check_new_partition_bound(char *relname, Relation parent, { int32 cmpval; + arg.bound.rbound = upper; cmpval = partition_bound_cmp(key, boundinfo, - offset + 1, upper, - true); + offset + 1, &arg); if (cmpval < 0) { /* @@ -2380,12 +2415,15 @@ get_partition_for_tuple(PartitionDispatch *pd, { bool equal = false; int cur_offset; + PartitionBoundCmpArg arg; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = false; + arg.datums = values; + arg.ndatums = key->partnatts; cur_offset = partition_bound_bsearch(key, partdesc->boundinfo, - values, - false, - &equal); + &arg, &equal); if (cur_offset >= 0 && equal) cur_index = partdesc->boundinfo->indexes[cur_offset]; } @@ -2397,6 +2435,7 @@ get_partition_for_tuple(PartitionDispatch *pd, range_partkey_has_null = false; int cur_offset; int i; + PartitionBoundCmpArg arg; /* * No range includes NULL, so this will be accepted by the @@ -2427,12 +2466,13 @@ get_partition_for_tuple(PartitionDispatch *pd, if (range_partkey_has_null) break; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = false; + arg.datums = values; + arg.ndatums = key->partnatts; cur_offset = partition_bound_bsearch(key, partdesc->boundinfo, - values, - false, - &equal); - + &arg, &equal); /* * The offset returned is such that the bound at * cur_offset is less than or equal to the tuple value, so @@ -2629,12 +2669,12 @@ partition_rbound_cmp(PartitionKey key, static int32 partition_rbound_datum_cmp(PartitionKey key, Datum *rb_datums, PartitionRangeDatumKind *rb_kind, - Datum *tuple_datums) + Datum *tuple_datums, int n_tuple_datums) { int i; int32 cmpval = -1; - for (i = 0; i < key->partnatts; i++) + for (i = 0; i < n_tuple_datums; i++) { if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE) return -1; @@ -2656,11 +2696,11 @@ partition_rbound_datum_cmp(PartitionKey key, * partition_bound_cmp * * Return whether the bound at offset in boundinfo is <, =, or > the argument - * specified in *probe. + * specified in *arg. */ static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, - int offset, void *probe, bool probe_is_bound) + int offset, PartitionBoundCmpArg *arg) { Datum *bound_datums = boundinfo->datums[offset]; int32 cmpval = -1; @@ -2668,17 +2708,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, switch (key->strategy) { case PARTITION_STRATEGY_LIST: - cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], - key->partcollation[0], - bound_datums[0], - *(Datum *) probe)); - break; + { + Datum listdatum; + + if (arg->is_bound) + listdatum = arg->bound.lbound->value; + else + { + if (arg->ndatums >= 1) + listdatum = arg->datums[0]; + /* + * If no tuple datum to compare with the bound, consider + * the latter to be greater. + */ + else + return 1; + } + + cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], + key->partcollation[0], + bound_datums[0], + listdatum)); + break; + } case PARTITION_STRATEGY_RANGE: { PartitionRangeDatumKind *kind = boundinfo->kind[offset]; - if (probe_is_bound) + if (arg->is_bound) { /* * We need to pass whether the existing bound is a lower @@ -2689,12 +2747,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, cmpval = partition_rbound_cmp(key, bound_datums, kind, lower, - (PartitionRangeBound *) probe); + arg->bound.rbound); } else cmpval = partition_rbound_datum_cmp(key, bound_datums, kind, - (Datum *) probe); + arg->datums, + arg->ndatums); break; } @@ -2708,20 +2767,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, /* * Binary search on a collection of partition bounds. Returns greatest - * bound in array boundinfo->datums which is less than or equal to *probe. - * If all bounds in the array are greater than *probe, -1 is returned. + * bound in array boundinfo->datums which is less than or equal to *arg. + * If all bounds in the array are greater than *arg, -1 is returned. * - * *probe could either be a partition bound or a Datum array representing - * the partition key of a tuple being routed; probe_is_bound tells which. - * We pass that down to the comparison function so that it can interpret the - * contents of *probe accordingly. + * *arg could either be a partition bound or a Datum array representing + * the partition key of a tuple being routed. We simply pass that down to + * partition_bound_cmp where it is interpreted appropriately. * * *is_equal is set to whether the bound at the returned index is equal with - * *probe. + * *arg. */ static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, - void *probe, bool probe_is_bound, bool *is_equal) + PartitionBoundCmpArg *arg, bool *is_equal) { int lo, hi, @@ -2734,8 +2792,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, int32 cmpval; mid = (lo + hi + 1) / 2; - cmpval = partition_bound_cmp(key, boundinfo, mid, probe, - probe_is_bound); + cmpval = partition_bound_cmp(key, boundinfo, mid, arg); if (cmpval <= 0) { lo = mid; -- 2.11.0
From b39fe19609c4cdec4961455540cef50beebce33a Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 13:48:13 +0900 Subject: [PATCH 3/4] WIP: Implement get_partitions_for_keys() Disable constraint exclusion that occurs using internal partition constraints, so that it's apparent what the new partition-pruning code still needs to do to able to create a plan matching the plain the the traditional constraint exclusion based partition-pruning would result in. --- src/backend/catalog/partition.c | 210 ++++++++++++++++++++++++++++++++++- src/backend/optimizer/util/plancat.c | 4 + 2 files changed, 209 insertions(+), 5 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 0133748234..9d4b7c1a7f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1391,8 +1391,6 @@ get_partition_dispatch_recurse(Relation rel, Relation parent, * datum that the query's bounding keys allow to be returned for the query. * Similarly, *max_datum_index. *null_partition_chosen returns whether * the null partition will be scanned. - * - * TODO: Implement. */ List * get_partitions_for_keys(Relation rel, @@ -1403,12 +1401,214 @@ get_partitions_for_keys(Relation rel, bool *null_partition_chosen, bool *default_partition_chosen) { + int i, + minoff, + maxoff; List *result = NIL; - int i; + PartitionKey partkey = RelationGetPartitionKey(rel); PartitionDesc partdesc = RelationGetPartitionDesc(rel); + PartitionBoundCmpArg arg; + bool is_equal, + scan_default = false; + int null_partition_idx = partdesc->boundinfo->null_index; - for (i = 0; i < partdesc->nparts; i++) - result = lappend_int(result, i); + *null_partition_chosen = false; + + /* + * Check if any of the scan keys are null. If so, return the only + * null-accepting partition if partdesc->boundinfo says there is one. + */ + for (i = 0; i < partkey->partnatts; i++) + { + if (keynullness[i] == IS_NULL) + { + if (null_partition_idx >= 0) + { + *null_partition_chosen = true; + result = list_make1_int(null_partition_idx); + } + else + result = NIL; + + return result; + } + } + + /* + * If query provides no quals, don't forget to scan the default partition. + */ + if (n_minkeys == 0 && n_maxkeys == 0) + scan_default = true; + + if (n_minkeys > 0 && partdesc->nparts > 0) + { + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.datums = minkeys; + arg.ndatums = n_minkeys; + minoff = partition_bound_bsearch(partkey, partdesc->boundinfo, + &arg, &is_equal); + + if (is_equal && arg.ndatums < partkey->partnatts) + { + int32 cmpval; + + is_equal = false; + + do + { + if (min_inclusive) + minoff -= 1; + else + minoff += 1; + if (minoff < 0 || + minoff >= partdesc->boundinfo->ndatums) + break; + cmpval = partition_bound_cmp(partkey, partdesc->boundinfo, + minoff, &arg); + } while (cmpval == 0); + } + + /* Interpret the result per partition strategy. */ + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + /* + * Found, but if the query may have asked us to exclude it. + */ + if (is_equal && !min_inclusive) + minoff++; + break; + + case PARTITION_STRATEGY_RANGE: + /* + * Records returned by the query will be > bounds[minoff], + * because min_scankey is >= bounds[minoff], that is, no + * records of the partition at minoff will be returned. Go + * to the next bound. + */ + if (minoff < partdesc->boundinfo->ndatums - 1) + minoff += 1; + + /* + * Make sure to skip a gap. + * Note: There are ndatums + 1 lots in the indexes array. + */ + if (partdesc->boundinfo->indexes[minoff] < 0 && + partdesc->boundinfo->indexes[minoff + 1] >= 0) + minoff += 1; + break; + } + } + else + minoff = 0; + + if (n_maxkeys > 0 && partdesc->nparts > 0) + { + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.datums = maxkeys; + arg.ndatums = n_maxkeys; + maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo, + &arg, &is_equal); + if (is_equal && arg.ndatums < partkey->partnatts) + { + int32 cmpval; + + is_equal = false; + + do + { + if (max_inclusive) + maxoff += 1; + else + maxoff -= 1; + if (maxoff < 0 || + maxoff >= partdesc->boundinfo->ndatums) + break; + cmpval = partition_bound_cmp(partkey, partdesc->boundinfo, + maxoff, &arg); + } while (cmpval == 0); + + /* Back up if went too far. */ + if (max_inclusive) + maxoff -= 1; + } + + /* Interpret the result per partition strategy. */ + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + /* + * Found, but if the query may have asked us to exclude it. + */ + if (is_equal && !max_inclusive) + maxoff--; + break; + + case PARTITION_STRATEGY_RANGE: + /* + * Because bounds[maxoff] <= max_scankey, we may need to + * to consider the next partition as well, in addition to + * the partition at maxoff and earlier. + */ + if (!is_equal || max_inclusive) + maxoff += 1; + + /* Make sure to skip a gap. */ + if (partdesc->boundinfo->indexes[maxoff] < 0 && maxoff >= 1) + maxoff -= 1; + break; + } + } + else + maxoff = partdesc->boundinfo->ndatums - 1; + + *min_datum_index = minoff; + *max_datum_index = maxoff; + + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + for (i = minoff; i <= maxoff; i++) + { + int partition_idx = partdesc->boundinfo->indexes[i]; + + /* + * Multiple values may belong to the same partition, so make + * sure we don't add the same partition index again. + */ + result = list_append_unique_int(result, partition_idx); + } + + /* If no bounding keys exist, include the null partition too. */ + if (null_partition_idx >= 0 && + (keynullness[0] == -1 || keynullness[0] != IS_NOT_NULL)) + { + *null_partition_chosen = true; + result = list_append_unique_int(result, null_partition_idx); + } + + break; + + case PARTITION_STRATEGY_RANGE: + for (i = minoff; i <= maxoff; i++) + { + int partition_idx = partdesc->boundinfo->indexes[i]; + + /* + * If a valid partition exists for this range, add its + * index, if not, the default partition (if any) would be + * covering that range, so request to include the same. + */ + if (partition_idx >= 0) + result = lappend_int(result, partition_idx); + else + scan_default = true; + } + break; + } + + if (scan_default && partdesc->boundinfo->default_index >= 0) + result = lappend_int(result, partdesc->boundinfo->default_index); return result; } diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 49578c0684..ef84dac7f2 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1160,7 +1160,9 @@ get_relation_constraints(PlannerInfo *root, Index varno = rel->relid; Relation relation; TupleConstr *constr; +#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING List *pcqual; +#endif /* * We assume the relation has already been safely locked. @@ -1246,6 +1248,7 @@ get_relation_constraints(PlannerInfo *root, } } +#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING /* Append partition predicates, if any */ pcqual = RelationGetPartitionQual(relation); if (pcqual) @@ -1263,6 +1266,7 @@ get_relation_constraints(PlannerInfo *root, result = list_concat(result, pcqual); } +#endif heap_close(relation, NoLock); -- 2.11.0
From 5c0a9b4e102f4f6b44cd56a812f3d54855b45c6e Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 5 Sep 2017 10:28:23 +0900 Subject: [PATCH 4/4] Add more tests for the new partitioning-related planning code --- src/test/regress/expected/partition.out | 465 ++++++++++++++++++++++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 + src/test/regress/sql/partition.sql | 82 ++++++ 4 files changed, 549 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/partition.out create mode 100644 src/test/regress/sql/partition.sql diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out new file mode 100644 index 0000000000..5242f4aa3d --- /dev/null +++ b/src/test/regress/expected/partition.out @@ -0,0 +1,465 @@ +-- +-- Test partitioning planner code +-- +create table lp (a char) partition by list (a); +create table lp_ef partition of lp for values in ('e', 'f'); +create table lp_ad partition of lp for values in ('a', 'd'); +create table lp_bc partition of lp for values in ('b', 'c'); +create table lp_null partition of lp for values in (null); +explain (costs off) select * from lp; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on lp_ad + -> Seq Scan on lp_bc + -> Seq Scan on lp_ef + -> Seq Scan on lp_null +(5 rows) + +explain (costs off) select * from lp where a > 'a' and a < 'd'; + QUERY PLAN +----------------------------------------------------------- + Append + -> Seq Scan on lp_bc + Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar)) +(3 rows) + +explain (costs off) select * from lp where a > 'a' and a <= 'd'; + QUERY PLAN +------------------------------------------------------------ + Append + -> Seq Scan on lp_bc + Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar)) + -> Seq Scan on lp_ad + Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar)) +(5 rows) + +explain (costs off) select * from lp where a = 'a'; + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on lp_ad + Filter: (a = 'a'::bpchar) +(3 rows) + +explain (costs off) select * from lp where 'a' = a; /* commutates */ + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on lp_ad + Filter: ('a'::bpchar = a) +(3 rows) + +explain (costs off) select * from lp where a is not null; + QUERY PLAN +--------------------------------- + Append + -> Seq Scan on lp_ad + Filter: (a IS NOT NULL) + -> Seq Scan on lp_bc + Filter: (a IS NOT NULL) + -> Seq Scan on lp_ef + Filter: (a IS NOT NULL) +(7 rows) + +explain (costs off) select * from lp where a is null; + QUERY PLAN +----------------------------- + Append + -> Seq Scan on lp_null + Filter: (a IS NULL) +(3 rows) + +create table rlp (a int, b varchar) partition by range (a); +create table rlp1 partition of rlp for values from (minvalue) to (1); +create table rlp2 partition of rlp for values from (1) to (10); +create table rlp3 (b varchar, a int) partition by list (b varchar_ops); +create table rlp3abcd partition of rlp3 for values in ('ab', 'cd'); +create table rlp3efgh partition of rlp3 for values in ('ef', 'gh'); +create table rlp3nullxy partition of rlp3 for values in (null, 'xy'); +alter table rlp attach partition rlp3 for values from (15) to (20); +create table rlp4 partition of rlp for values from (20) to (30); +create table rlp5 partition of rlp for values from (31) to (maxvalue); +explain (costs off) select * from rlp where a < 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a < 1) +(3 rows) + +explain (costs off) select * from rlp where 1 > a; /* commutates */ + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp1 + Filter: (1 > a) +(3 rows) + +explain (costs off) select * from rlp where a <= 1; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a <= 1) + -> Seq Scan on rlp2 + Filter: (a <= 1) +(5 rows) + +explain (costs off) select * from rlp where a = 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp2 + Filter: (a = 1) +(3 rows) + +explain (costs off) select * from rlp where a = 1::bigint; /* same as above */ + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on rlp2 + Filter: (a = '1'::bigint) +(3 rows) + +explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */ + QUERY PLAN +----------------------------------------------- + Append + -> Seq Scan on rlp1 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp2 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3abcd + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3efgh + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3nullxy + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp4 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp5 + Filter: ((a)::numeric = '1'::numeric) +(15 rows) + +explain (costs off) select * from rlp where a <= 10; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a <= 10) + -> Seq Scan on rlp2 + Filter: (a <= 10) +(5 rows) + +explain (costs off) select * from rlp where a > 10; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: (a > 10) + -> Seq Scan on rlp3efgh + Filter: (a > 10) + -> Seq Scan on rlp3nullxy + Filter: (a > 10) + -> Seq Scan on rlp4 + Filter: (a > 10) + -> Seq Scan on rlp5 + Filter: (a > 10) +(11 rows) + +explain (costs off) select * from rlp where a < 15; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a < 15) + -> Seq Scan on rlp2 + Filter: (a < 15) +(5 rows) + +explain (costs off) select * from rlp where a <= 15; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp1 + Filter: (a <= 15) + -> Seq Scan on rlp2 + Filter: (a <= 15) + -> Seq Scan on rlp3abcd + Filter: (a <= 15) + -> Seq Scan on rlp3efgh + Filter: (a <= 15) + -> Seq Scan on rlp3nullxy + Filter: (a <= 15) +(11 rows) + +explain (costs off) select * from rlp where a > 15 and b = 'ab'; + QUERY PLAN +--------------------------------------------------------- + Append + -> Seq Scan on rlp3abcd + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) + -> Seq Scan on rlp4 + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) + -> Seq Scan on rlp5 + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) +(7 rows) + +explain (costs off) select * from rlp where a = 16 and b is null; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on rlp3nullxy + Filter: ((b IS NULL) AND (a = 16)) +(3 rows) + +explain (costs off) select * from rlp where a = 16 and b is not null; + QUERY PLAN +------------------------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: ((b IS NOT NULL) AND (a = 16)) + -> Seq Scan on rlp3efgh + Filter: ((b IS NOT NULL) AND (a = 16)) + -> Seq Scan on rlp3nullxy + Filter: ((b IS NOT NULL) AND (a = 16)) +(7 rows) + +explain (costs off) select * from rlp where a is null; /* while we're on nulls */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from rlp where a > 30; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp5 + Filter: (a > 30) +(3 rows) + +explain (costs off) select * from rlp where a = 30; /* empty */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from rlp where a <= 31; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp1 + Filter: (a <= 31) + -> Seq Scan on rlp2 + Filter: (a <= 31) + -> Seq Scan on rlp3abcd + Filter: (a <= 31) + -> Seq Scan on rlp3efgh + Filter: (a <= 31) + -> Seq Scan on rlp3nullxy + Filter: (a <= 31) + -> Seq Scan on rlp4 + Filter: (a <= 31) + -> Seq Scan on rlp5 + Filter: (a <= 31) +(15 rows) + +-- multi-column keys +create table mc3p (a int, b int, c int) partition by range (a, abs(b), c); +create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1); +create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10); +create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10); +create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20); +create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue); +create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10); +create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20); +create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue); +explain (costs off) select * from mc3p where a = 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on mc3p0 + Filter: (a = 1) + -> Seq Scan on mc3p1 + Filter: (a = 1) +(5 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) < 1; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a = 1) AND (abs(b) < 1)) +(3 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a = 1) AND (abs(b) = 1)) + -> Seq Scan on mc3p1 + Filter: ((a = 1) AND (abs(b) = 1)) +(5 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8; + QUERY PLAN +-------------------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1)) + -> Seq Scan on mc3p1 + Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1)) +(5 rows) + +explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35; + QUERY PLAN +----------------------------------------------------------------- + Append + -> Seq Scan on mc3p1 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p2 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p3 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p4 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) +(9 rows) + +explain (costs off) select * from mc3p where a > 10; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p5 + Filter: (a > 10) + -> Seq Scan on mc3p6 + Filter: (a > 10) + -> Seq Scan on mc3p7 + Filter: (a > 10) +(7 rows) + +explain (costs off) select * from mc3p where a >= 10; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on mc3p1 + Filter: (a >= 10) + -> Seq Scan on mc3p2 + Filter: (a >= 10) + -> Seq Scan on mc3p3 + Filter: (a >= 10) + -> Seq Scan on mc3p4 + Filter: (a >= 10) + -> Seq Scan on mc3p5 + Filter: (a >= 10) + -> Seq Scan on mc3p6 + Filter: (a >= 10) + -> Seq Scan on mc3p7 + Filter: (a >= 10) +(15 rows) + +explain (costs off) select * from mc3p where a < 10; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p0 + Filter: (a < 10) + -> Seq Scan on mc3p1 + Filter: (a < 10) +(5 rows) + +explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10; + QUERY PLAN +----------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a <= 10) AND (abs(b) < 10)) + -> Seq Scan on mc3p1 + Filter: ((a <= 10) AND (abs(b) < 10)) + -> Seq Scan on mc3p2 + Filter: ((a <= 10) AND (abs(b) < 10)) +(7 rows) + +explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100; + QUERY PLAN +------------------------------------------------------------ + Append + -> Seq Scan on mc3p6 + Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10)) +(3 rows) + +explain (costs off) select * from mc3p where a > 20; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p7 + Filter: (a > 20) +(3 rows) + +explain (costs off) select * from mc3p where a >= 20; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on mc3p5 + Filter: (a >= 20) + -> Seq Scan on mc3p6 + Filter: (a >= 20) + -> Seq Scan on mc3p7 + Filter: (a >= 20) +(7 rows) + +-- XXX - '<>' clauses cannot be handled yet +explain (costs off) select * from lp where a <> 'a'; + QUERY PLAN +------------------------------------ + Append + -> Seq Scan on lp_ad + Filter: (a <> 'a'::bpchar) + -> Seq Scan on lp_bc + Filter: (a <> 'a'::bpchar) + -> Seq Scan on lp_ef + Filter: (a <> 'a'::bpchar) +(7 rows) + +-- XXX - redundant clause elimination does not happen yet +explain (costs off) select * from mc3p where a = 10 and a > 1; + QUERY PLAN +---------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p1 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p2 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p3 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p4 + Filter: ((a > 1) AND (a = 10)) +(11 rows) + +-- XXX - the OR clauses don't contribute to partition-pruning yet +explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef'; + QUERY PLAN +------------------------------------------------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) + -> Seq Scan on rlp3efgh + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) + -> Seq Scan on rlp3nullxy + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) +(7 rows) + +drop table lp, rlp, mc3p; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 2fd3f2b1b1..2eb81fcf41 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -60,7 +60,7 @@ test: create_index create_view # ---------- # Another group of parallel tests # ---------- -test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func +test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition # ---------- # sanity_check does a vacuum, affecting the sort order of SELECT * diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule index 76b0de30a7..6611662149 100644 --- a/src/test/regress/serial_schedule +++ b/src/test/regress/serial_schedule @@ -71,6 +71,7 @@ test: create_cast test: constraints test: triggers test: inherit +test: partition test: create_table_like test: typed_table test: vacuum diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql new file mode 100644 index 0000000000..4b562a655b --- /dev/null +++ b/src/test/regress/sql/partition.sql @@ -0,0 +1,82 @@ +-- +-- Test partitioning planner code +-- +create table lp (a char) partition by list (a); +create table lp_ef partition of lp for values in ('e', 'f'); +create table lp_ad partition of lp for values in ('a', 'd'); +create table lp_bc partition of lp for values in ('b', 'c'); +create table lp_null partition of lp for values in (null); +explain (costs off) select * from lp; +explain (costs off) select * from lp where a > 'a' and a < 'd'; +explain (costs off) select * from lp where a > 'a' and a <= 'd'; +explain (costs off) select * from lp where a = 'a'; +explain (costs off) select * from lp where 'a' = a; /* commutates */ +explain (costs off) select * from lp where a is not null; +explain (costs off) select * from lp where a is null; + +create table rlp (a int, b varchar) partition by range (a); +create table rlp1 partition of rlp for values from (minvalue) to (1); +create table rlp2 partition of rlp for values from (1) to (10); + +create table rlp3 (b varchar, a int) partition by list (b varchar_ops); +create table rlp3abcd partition of rlp3 for values in ('ab', 'cd'); +create table rlp3efgh partition of rlp3 for values in ('ef', 'gh'); +create table rlp3nullxy partition of rlp3 for values in (null, 'xy'); +alter table rlp attach partition rlp3 for values from (15) to (20); + +create table rlp4 partition of rlp for values from (20) to (30); +create table rlp5 partition of rlp for values from (31) to (maxvalue); + +explain (costs off) select * from rlp where a < 1; +explain (costs off) select * from rlp where 1 > a; /* commutates */ +explain (costs off) select * from rlp where a <= 1; +explain (costs off) select * from rlp where a = 1; +explain (costs off) select * from rlp where a = 1::bigint; /* same as above */ +explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */ +explain (costs off) select * from rlp where a <= 10; +explain (costs off) select * from rlp where a > 10; +explain (costs off) select * from rlp where a < 15; +explain (costs off) select * from rlp where a <= 15; +explain (costs off) select * from rlp where a > 15 and b = 'ab'; +explain (costs off) select * from rlp where a = 16 and b is null; +explain (costs off) select * from rlp where a = 16 and b is not null; +explain (costs off) select * from rlp where a is null; /* while we're on nulls */ +explain (costs off) select * from rlp where a > 30; +explain (costs off) select * from rlp where a = 30; /* empty */ +explain (costs off) select * from rlp where a <= 31; + +-- multi-column keys +create table mc3p (a int, b int, c int) partition by range (a, abs(b), c); +create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1); +create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10); +create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10); +create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20); +create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue); +create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10); +create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20); +create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue); + +explain (costs off) select * from mc3p where a = 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) < 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8; +explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35; +explain (costs off) select * from mc3p where a > 10; +explain (costs off) select * from mc3p where a >= 10; +explain (costs off) select * from mc3p where a < 10; +explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10; +explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */ +explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100; +explain (costs off) select * from mc3p where a > 20; +explain (costs off) select * from mc3p where a >= 20; + +-- XXX - '<>' clauses cannot be handled yet +explain (costs off) select * from lp where a <> 'a'; + +-- XXX - redundant clause elimination does not happen yet +explain (costs off) select * from mc3p where a = 10 and a > 1; + +-- XXX - the OR clauses don't contribute to partition-pruning yet +explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef'; + +drop table lp, rlp, mc3p; -- 2.11.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers