On 2017/09/15 11:16, Amit Langote wrote: > I will post rebased patches later today, although I think the overall > design of the patch on the planner side of things is not quite there yet. > Of course, your and others' feedback is greatly welcome.
Rebased patches attached. Because Dilip complained earlier today about clauses of the form (const op var) not causing partition-pruning, I've added code to commute the clause where it is required. Some other previously mentioned limitations remain -- no handling of OR clauses, no elimination of redundant clauses for given partitioning column, etc. A note about 0001: this patch overlaps with 0003-Canonical-partition-scheme.patch from the partitionwise-join patch series that Ashutosh Bapat posted yesterday [1]. Because I implemented the planner-portion of this patch based on what 0001 builds, I'm posting it here. It might actually turn out that we will review and commit 0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply 0001 if you want to play with the later patches. I would certainly like to review 0003-Canonical-partition-scheme.patch myself, but won't be able to immediately (see below). > Also, I must inform to all of those who're looking at this thread that I > won't be able to respond to emails from tomorrow (9/16, Sat) until 9/23, > Sat, due to some personal business. To remind. Thanks, Amit [1] https://www.postgresql.org/message-id/CAFiTN-skmaqeCVaoAHCBqe2DyfO3f6sgdtEjHWrUgi0kV1yPLQ%40mail.gmail.com
From ff9ccd8df6555cfca31e54e22293ac1613db327c Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 13 Sep 2017 18:24:55 +0900 Subject: [PATCH 1/5] Some optimizer data structures for partitioned rels Nobody uses it though. --- src/backend/optimizer/util/plancat.c | 120 +++++++++++++++++++++++++++++++ src/backend/optimizer/util/relnode.c | 20 ++++++ src/include/nodes/nodes.h | 1 + src/include/nodes/relation.h | 135 +++++++++++++++++++++++++++++++++++ src/include/optimizer/plancat.h | 2 + 5 files changed, 278 insertions(+) diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index a1ebd4acc8..f7e3a1df5f 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -68,6 +68,8 @@ static List *get_relation_constraints(PlannerInfo *root, static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index, Relation heapRelation); static List *get_relation_statistics(RelOptInfo *rel, Relation relation); +static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, + Relation relation); /* * get_relation_info - @@ -420,6 +422,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Collect info about relation's foreign keys, if relevant */ get_relation_foreign_keys(root, rel, relation, inhparent); + /* Collect partitioning info, if relevant. */ + if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + get_relation_partition_info(root, rel, relation); + heap_close(relation, NoLock); /* @@ -1802,3 +1808,117 @@ has_row_triggers(PlannerInfo *root, Index rti, CmdType event) heap_close(relation, NoLock); return result; } + +static void +get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, + Relation relation) +{ + int i; + ListCell *l; + PartitionKey key = RelationGetPartitionKey(relation); + PartitionDesc partdesc = RelationGetPartitionDesc(relation); + + rel->part_scheme = find_partition_scheme(root, relation); + rel->partexprs = (List **) palloc0(key->partnatts * sizeof(List *)); + + l = list_head(key->partexprs); + for (i = 0; i < key->partnatts; i++) + { + Expr *keyCol; + + if (key->partattrs[i] != 0) + { + keyCol = (Expr *) makeVar(rel->relid, + key->partattrs[i], + key->parttypid[i], + key->parttypmod[i], + key->parttypcoll[i], + 0); + } + else + { + if (l == NULL) + elog(ERROR, "wrong number of partition key expressions"); + keyCol = (Expr *) copyObject(lfirst(l)); + l = lnext(l); + } + + rel->partexprs[i] = list_make1(keyCol); + } + + /* Values are filled in build_simple_rel(). */ + rel->child_appinfos = (AppendRelInfo **) palloc0(partdesc->nparts * + sizeof(AppendRelInfo *)); + + /* + * A PartitionAppendInfo to map this table to its immediate partitions + * that will be scanned by this query. At the same time, it records the + * table's partitioning properties reflecting any partition-pruning that + * might occur to satisfy the query. Rest of the fields are set in + * get_rel_partitions() and set_append_rel_size(). + */ + rel->painfo = makeNode(PartitionAppendInfo); + rel->painfo->boundinfo = partdesc->boundinfo; +} + +/* + * find_partition_scheme + * + * The function returns a canonical partition scheme which exactly matches the + * partitioning scheme of the given relation if one exists in the list of + * canonical partitioning schemes maintained in PlannerInfo. If none of the + * existing partitioning schemes match, the function creates a canonical + * partition scheme and adds it to the list. + * + * For an unpartitioned table or for a multi-level partitioned table it returns + * NULL. See comments in the function for more details. + */ +PartitionScheme +find_partition_scheme(PlannerInfo *root, Relation relation) +{ + ListCell *lc; + PartitionKey key = RelationGetPartitionKey(relation); + char strategy = key->strategy; + int partnatts = key->partnatts; + PartitionScheme part_scheme = NULL; + + /* Search for a matching partition scheme and return if found one. */ + foreach(lc, root->partition_schemes) + { + part_scheme = lfirst(lc); + + /* Match various partitioning attributes. */ + if (strategy != part_scheme->strategy || + partnatts != part_scheme->partnatts || + memcmp(key->parttypid, part_scheme->parttypid, + sizeof(Oid) * partnatts) != 0 || + memcmp(key->parttypmod, part_scheme->parttypmod, + sizeof(int32) * partnatts) != 0 || + memcmp(key->partcollation, part_scheme->partcollation, + sizeof(Oid) * partnatts) != 0 || + memcmp(key->partopfamily, part_scheme->partopfamily, + sizeof(Oid) * partnatts) != 0 || + memcmp(key->partopcintype, part_scheme->partopcintype, + sizeof(Oid) * partnatts) != 0) + continue; + + /* Found a matching partition scheme. */ + return part_scheme; + } + + /* Did not find matching partition scheme. Create one. */ + part_scheme = (PartitionScheme) palloc0(sizeof(PartitionSchemeData)); + + part_scheme->strategy = strategy; + part_scheme->partnatts = partnatts; + part_scheme->parttypid = key->parttypid; + part_scheme->parttypmod = key->parttypmod; + part_scheme->partcollation = key->partcollation; + part_scheme->partopfamily = key->partopfamily; + part_scheme->partopcintype = key->partopcintype; + + /* Add the partitioning scheme to PlannerInfo. */ + root->partition_schemes = lappend(root->partition_schemes, part_scheme); + + return part_scheme; +} diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index c7b2695ebb..f0973b83b9 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -17,6 +17,7 @@ #include <limits.h> #include "miscadmin.h" +#include "catalog/pg_class.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" @@ -163,6 +164,11 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) else rel->top_parent_relids = NULL; + rel->child_appinfos = NULL; + rel->part_scheme = NULL; + rel->partexprs = NULL; + rel->painfo = NULL; + /* Check type of rtable entry */ switch (rte->rtekind) { @@ -218,7 +224,18 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) if (rte->inh) { ListCell *l; + AppendRelInfo **child_appinfos = NULL; + int i; + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { + Assert(rel->part_scheme != NULL); + Assert(rel->child_appinfos != NULL); + Assert(rel->painfo != NULL); + child_appinfos = rel->child_appinfos; + } + + i = 0; foreach(l, root->append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); @@ -229,6 +246,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) (void) build_simple_rel(root, appinfo->child_relid, rel); + + if (child_appinfos) + child_appinfos[i++] = appinfo; } } diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 27bd4f3363..63196a1211 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -261,6 +261,7 @@ typedef enum NodeTag T_SpecialJoinInfo, T_AppendRelInfo, T_PartitionedChildRelInfo, + T_PartitionAppendInfo, T_PlaceHolderInfo, T_MinMaxAggInfo, T_PlannerParamItem, diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index d50ff55681..0f4996b424 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -266,6 +266,8 @@ typedef struct PlannerInfo List *distinct_pathkeys; /* distinctClause pathkeys, if any */ List *sort_pathkeys; /* sortClause pathkeys, if any */ + List *partition_schemes; /* List of PartitionScheme objects. */ + List *initial_rels; /* RelOptInfos we are now trying to join */ /* Use fetch_upper_rel() to get any particular upper rel */ @@ -326,6 +328,48 @@ typedef struct PlannerInfo ((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \ rt_fetch(rti, (root)->parse->rtable)) +/* + * Partitioning scheme + * Structure to hold partitioning scheme for a given relation. + * + * Multiple relations may be partitioned in the same way. The relations + * resulting from joining such relations may be partitioned in the same way as + * the joining relations. Similarly, relations derived from such relations by + * grouping, sorting may be partitioned in the same way as the underlying scan + * relations. All such relations partitioned in the same way share the + * partitioning scheme. + * + * PlannerInfo stores a list of distinct "canonical" partitioning schemes. + * RelOptInfo of a partitioned relation holds the pointer to "canonical" + * partitioning scheme. + * + * We store opclass declared input data types instead of partition key + * datatypes since those are the ones used to compare partition bounds instead + * of actual partition key data types. Since partition key data types and the + * opclass declared input data types are expected to be binary compatible (per + * ResolveOpClass()), both of those should have same byval and length + * properties. + * + * The structure caches information about partition key data type to be used + * while matching partition bounds. While comparing partition schemes we don't + * need to compare this information as it should be same when opclass declared + * input data types are same for two partitioned relations. + */ +typedef struct PartitionSchemeData +{ + char strategy; /* Partitioning strategy */ + int16 partnatts; /* Number of partitioning attributes */ + + /* The following arrays each have partnatts members. */ + Oid *parttypid; /* Type OIDs */ + int32 *parttypmod; /* Typemod values */ + Oid *partcollation; /* Partitioning collation */ + Oid *partopfamily; /* Operator family OIDs */ + Oid *partopcintype; /* Operator class-declared input type OIDs */ +} PartitionSchemeData; + +typedef struct PartitionSchemeData *PartitionScheme; + /*---------- * RelOptInfo @@ -515,6 +559,9 @@ typedef enum RelOptKind /* Is the given relation an "other" relation? */ #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL) +typedef struct AppendRelInfo AppendRelInfo; +typedef struct PartitionAppendInfo PartitionAppendInfo; + typedef struct RelOptInfo { NodeTag type; @@ -592,6 +639,42 @@ typedef struct RelOptInfo /* used by "other" relations */ Relids top_parent_relids; /* Relids of topmost parents */ + + /* Fields set for partitioned relations */ + + /* + * Information about the partitioning attributes, such as the number of + * attributes, arrays containing per-attribute type/tpymod, partitioning + * collation, operator family OIDs, etc. + */ + PartitionScheme part_scheme; + + /* + * Following contains the exact identities of the individual partitioning + * attributes. For example, if the attribute is a table's column, then + * it will be represented herein by a Var node for the same. This is + * structured as an array of Lists with part_scheme->partnatts members, + * with each list containing the expression(s) corresponding to the ith + * partitioning attribute (0 <= i < part_schem->partnatts) of this rel. + * For baserels, there is just a single expression in each slot (the ith + * list) of the array, because it corresponds to just one table. But for + * a joinrel, there will be as many expressions as there are tables + * involved in that joinrel. We have to do it that way, because in the + * joinrel case, the same corresponding partitioning attribute may have + * different identities in different tables involved in the join; for + * example, a Var node's varno will differ and so might varattnos. + */ + List **partexprs; + + /* AppendRelInfos of *all* partitions of the table. */ + AppendRelInfo **child_appinfos; + + /* + * For a partitioned relation, the following represents the identities + * of its live partition (their RT indexes) and some informations about + * the bounds that the live partitions satisfy. + */ + PartitionAppendInfo *painfo; } RelOptInfo; /* @@ -2031,6 +2114,58 @@ typedef struct PartitionedChildRelInfo List *child_rels; } PartitionedChildRelInfo; +/* Forward declarations, to avoid including other headers */ +typedef struct PartitionDispatchData *PartitionDispatch; +typedef struct PartitionBoundInfoData *PartitionBoundInfo; +typedef struct PartitionKeyData *PartitionKey; + +/* + * PartitionAppendInfo - Properties of partitions contained in the Append path + * of a given partitioned table + */ +typedef struct PartitionAppendInfo +{ + NodeTag type; + + /* + * List of AppendRelInfos of the table's partitions that satisfy a given + * query. + */ + List *live_partition_appinfos; + + /* + * RT indexes of live partitions that are partitioned tables themselves. + * This includes the RT index of the table itself. + */ + List *live_partitioned_rels; + + /* + * The following simply copies the pointer to boundinfo in the table's + * PartitionDesc. + */ + PartitionBoundInfo boundinfo; + + /* + * Indexes in the boundinfo->datums array of the smallest and the largest + * value of the partition key that the query allows. They are set by + * calling get_partitions_for_keys(). + */ + int min_datum_idx; + int max_datum_idx; + + /* + * Does this Append contain the null-accepting partition, if one exists + * and is allowed by the query's quals. + */ + bool contains_null_partition; + + /* + * Does this Append contain the default partition, if one exists and is + * allowed by the query's quals. + */ + bool contains_default_partition; +} PartitionAppendInfo; + /* * For each distinct placeholder expression generated during planning, we * store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list. diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h index 71f0faf938..c45db074c6 100644 --- a/src/include/optimizer/plancat.h +++ b/src/include/optimizer/plancat.h @@ -56,5 +56,7 @@ extern Selectivity join_selectivity(PlannerInfo *root, SpecialJoinInfo *sjinfo); extern bool has_row_triggers(PlannerInfo *root, Index rti, CmdType event); +extern PartitionScheme find_partition_scheme(PlannerInfo *root, + Relation relation); #endif /* PLANCAT_H */ -- 2.11.0
From 24bc6a3428730e24b04b2b1282960e45ffeb0467 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 17:31:42 +0900 Subject: [PATCH 2/5] WIP: planner-side changes for partition-pruning Firstly, this adds a stub get_partitions_for_keys() in partition.c with appropriate interface for the caller to specify bounding scan keys, along with other information about the scan keys extracted from the query, such as NULL-ness of the keys, inclusive-ness, etc. More importantly, this implements the planner-side logic to extract bounding scan keys to be passed to get_partitions_for_keys. That is, it will go through rel->baserestrictinfo and match individual clauses to partition keys and construct lower bound and upper bound tuples, which may cover only a prefix of a multi-column partition key. A bunch of smarts are still missing when mapping the clause operands with keys. For example, code to match a clause is specifed as (constant op var) doesn't exist. Also, redundant keys are not eliminated, for example, a combination of clauses a = 10 and a > 1 will cause the later clause a > 1 taking over and resulting in needless scanning of partitions containing values a > 1 and a < 10. ...constraint exclusion is still used, because get_partitions_for_keys is just a stub... --- src/backend/catalog/partition.c | 43 ++++ src/backend/optimizer/path/allpaths.c | 358 ++++++++++++++++++++++++++++++---- src/include/catalog/partition.h | 9 + 3 files changed, 371 insertions(+), 39 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 1ab6dba7ae..ccf8a1fa67 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1335,6 +1335,49 @@ get_partition_dispatch_recurse(Relation rel, Relation parent, } } +/* + * get_partitions_for_keys + * Returns the list of indexes of rel's partitions that will need to be + * scanned given the bounding scan keys. + * + * Each value in the returned list can be used as an index into the oids array + * of the partition descriptor. + * + * Inputs: + * keynullness contains between 0 and (key->partnatts - 1) values, each + * telling what kind of NullTest has been applies to the corresponding + * partition key column. minkeys represents the lower bound on the partition + * the key of the records that the query will return, while maxkeys + * represents upper bound. min_inclusive and max_inclusive tell whether the + * bounds specified minkeys and maxkeys is inclusive, respectively. + * + * Other outputs: + * *min_datum_index will return the index in boundinfo->datums of the first + * datum that the query's bounding keys allow to be returned for the query. + * Similarly, *max_datum_index. *null_partition_chosen returns whether + * the null partition will be scanned. + * + * TODO: Implement. + */ +List * +get_partitions_for_keys(Relation rel, + NullTestType *keynullness, + Datum *minkeys, int n_minkeys, bool min_inclusive, + Datum *maxkeys, int n_maxkeys, bool max_inclusive, + int *min_datum_index, int *max_datum_index, + bool *null_partition_chosen, + bool *default_partition_chosen) +{ + List *result = NIL; + int i; + PartitionDesc partdesc = RelationGetPartitionDesc(rel); + + for (i = 0; i < partdesc->nparts; i++) + result = lappend_int(result, i); + + return result; +} + /* Module-local functions */ /* diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 5b746a906a..6e5efe98f9 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -20,6 +20,7 @@ #include "access/sysattr.h" #include "access/tsmapi.h" +#include "catalog/partition.h" #include "catalog/pg_class.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -846,6 +847,251 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) } /* + * get_rel_partitions + * Return the list of partitions of rel that pass the query clauses + * + * Returned list contains the AppendInfos of the chosen partitions. + */ +static List * +get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) +{ + Relation parent = heap_open(rte->relid, NoLock); + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + List *indexes; + List *result = NIL; + ListCell *lc1, + *lc2; + int keyPos; + List *matchedclauses[PARTITION_MAX_KEYS]; + NullTestType keynullness[PARTITION_MAX_KEYS]; + Datum minkeys[PARTITION_MAX_KEYS], + maxkeys[PARTITION_MAX_KEYS]; + bool need_next_min, + need_next_max, + minkey_set[PARTITION_MAX_KEYS], + maxkey_set[PARTITION_MAX_KEYS], + min_incl, + max_incl; + int n_minkeys = 0, + n_maxkeys = 0, + i; + + /* + * Match individual OpExprs in the query's restriction with individual + * partition key columns. There is one list per key. + */ + memset(keynullness, -1, sizeof(keynullness)); + memset(matchedclauses, 0, sizeof(matchedclauses)); + keyPos = 0; + for (i = 0; i < rel->part_scheme->partnatts; i++) + { + Node *partkey = linitial(rel->partexprs[i]); + + foreach(lc2, rel->baserestrictinfo) + { + RestrictInfo *rinfo = lfirst(lc2); + Expr *clause = rinfo->clause; + + if (is_opclause(clause)) + { + Node *leftop = get_leftop(clause), + *rightop = get_rightop(clause); + Oid expr_op = ((OpExpr *) clause)->opno; + + if (IsA(leftop, RelabelType)) + leftop = (Node *) ((RelabelType *) leftop)->arg; + if (IsA(rightop, RelabelType)) + rightop = (Node *) ((RelabelType *) rightop)->arg; + + if (equal(leftop, partkey)) + { + matchedclauses[keyPos] = lappend(matchedclauses[keyPos], + clause); + /* A strict operator implies NOT NULL argument. */ + keynullness[keyPos] = IS_NOT_NULL; + } + else if (equal(rightop, partkey)) + { + Oid commutator = get_commutator(expr_op); + + if (OidIsValid(commutator)) + { + OpExpr *commutated_expr; + + /* + * Generate a commutated copy of the expression, but + * try to make it look valid, because we only want + * it to put the constant operand in a place that the + * following code knows as the only place to find it. + */ + commutated_expr = (OpExpr *) copyObject(clause); + commutated_expr->opno = commutator; /* really? */ + commutated_expr->args = list_make2(rightop, leftop); + matchedclauses[keyPos] = + lappend(matchedclauses[keyPos], + commutated_expr); + /* A strict operator implies NOT NULL argument. */ + keynullness[keyPos] = IS_NOT_NULL; + } + } + } + else if (IsA(clause, NullTest)) + { + NullTest *nulltest = (NullTest *) clause; + Node *arg = (Node *) nulltest->arg; + + if (equal(arg, partkey)) + keynullness[keyPos] = nulltest->nulltesttype; + } + } + + /* Onto finding clauses matching the next partition key. */ + keyPos++; + } + + /* + * Determine the min keys and the max keys using btree semantics-based + * interpretation of the clauses' operators. + */ + + /* + * XXX - There should be a step similar to _bt_preprocess_keys() here, + * to eliminate any redundant scan keys for a given partition column. For + * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being + * more restrictive and discard a <= 5. While doing that, we can also + * check to see if there exists a contradictory combination of scan keys + * that makes the query trivially false for all records in the table. + */ + memset(minkeys, 0, sizeof(minkeys)); + memset(maxkeys, 0, sizeof(maxkeys)); + memset(minkey_set, false, sizeof(minkey_set)); + memset(maxkey_set, false, sizeof(maxkey_set)); + need_next_min = true; + need_next_max = true; + for (i = 0; i < rel->part_scheme->partnatts; i++) + { + /* + * If no scan key existed for the previous column, we are done. + */ + if (i > n_minkeys) + need_next_min = false; + + if (i > n_maxkeys) + need_next_max = false; + + foreach(lc1, matchedclauses[i]) + { + Expr *clause = lfirst(lc1); + Const *rightop = (Const *) get_rightop(clause); + Oid opno = ((OpExpr *) clause)->opno, + opfamily = rel->part_scheme->partopfamily[i]; + StrategyNumber strategy; + + strategy = get_op_opfamily_strategy(opno, opfamily); + switch (strategy) + { + case BTLessStrategyNumber: + case BTLessEqualStrategyNumber: + if (need_next_max) + { + maxkeys[i] = rightop->constvalue; + if (!maxkey_set[i]) + n_maxkeys++; + maxkey_set[i] = true; + max_incl = (strategy == BTLessEqualStrategyNumber); + } + if (strategy == BTLessStrategyNumber) + need_next_max = false; + break; + + case BTGreaterStrategyNumber: + case BTGreaterEqualStrategyNumber: + if (need_next_min) + { + minkeys[i] = rightop->constvalue; + if (!minkey_set[i]) + n_minkeys++; + minkey_set[i] = true; + min_incl = (strategy == BTGreaterEqualStrategyNumber); + } + if (strategy == BTGreaterStrategyNumber) + need_next_min = false; + break; + + case BTEqualStrategyNumber: + if (need_next_min) + { + minkeys[i] = rightop->constvalue; + if (!minkey_set[i]) + n_minkeys++; + } + minkey_set[i] = true; + min_incl = true; + + if (need_next_max) + { + maxkeys[i] = rightop->constvalue; + if (!maxkey_set[i]) + n_maxkeys++; + } + maxkey_set[i] = true; + max_incl = true; + break; + + /* + * This might mean '<>', but we don't have anything for that + * case yet. Perhaps, handle that as key < const OR + * key > const, once we have props needed for handling OR + * clauses. + */ + default: + min_incl = max_incl = false; + break; + } + } + } + + /* Ask partition.c which partitions it thinks match the keys. */ + indexes = get_partitions_for_keys(parent, keynullness, + minkeys, n_minkeys, min_incl, + maxkeys, n_maxkeys, max_incl, + &rel->painfo->min_datum_idx, + &rel->painfo->max_datum_idx, + &rel->painfo->contains_null_partition, + &rel->painfo->contains_default_partition); + + if (indexes != NIL) + { +#ifdef USE_ASSERT_CHECKING + int first_index, + last_index; + first_index = linitial_int(indexes); + last_index = llast_int(indexes); + Assert(first_index <= last_index || + rel->part_scheme->strategy != PARTITION_STRATEGY_RANGE); +#endif + + foreach(lc1, indexes) + { + int partidx = lfirst_int(lc1); + AppendRelInfo *appinfo = rel->child_appinfos[partidx]; +#ifdef USE_ASSERT_CHECKING + RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root); + Assert(partdesc->oids[partidx] == rte->relid); +#endif + result = lappend(result, appinfo); + } + } + + /* Remember for future users such as set_append_rel_pathlist(). */ + rel->painfo->live_partition_appinfos = result; + + heap_close(parent, NoLock); + + return result; +} + +/* * set_append_rel_size * Set size estimates for a simple "append relation" * @@ -860,6 +1106,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte) { + List *rel_appinfos = NIL; int parentRTindex = rti; bool has_live_children; double parent_rows; @@ -873,6 +1120,24 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Assert(IS_SIMPLE_REL(rel)); + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach (l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + { + rel_appinfos = get_rel_partitions(root, rel, rte); + Assert(rel->painfo != NULL); + rel->painfo->live_partitioned_rels = list_make1_int(rti); + } + /* * Initialize to compute size estimates for whole append relation. * @@ -893,7 +1158,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, nattrs = rel->max_attr - rel->min_attr + 1; parent_attrsizes = (double *) palloc0(nattrs * sizeof(double)); - foreach(l, root->append_rel_list) + foreach(l, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); int childRTindex; @@ -906,10 +1171,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, ListCell *childvars; ListCell *lc; - /* append_rel_list contains all append rels; ignore others */ - if (appinfo->parent_relid != parentRTindex) - continue; - childRTindex = appinfo->child_relid; childRTE = root->simple_rte_array[childRTindex]; @@ -1118,6 +1379,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, has_live_children = true; /* + * If childrel is itself partitioned, add it and its partitioned + * children to the list being propagated up to the root rel. + */ + if (childrel->painfo && rel->painfo) + { + rel->painfo->live_partitioned_rels = + list_concat(rel->painfo->live_partitioned_rels, + list_copy(childrel->painfo->live_partitioned_rels)); + } + + /* * If any live child is not parallel-safe, treat the whole appendrel * as not parallel-safe. In future we might be able to generate plans * in which some children are farmed out to workers while others are @@ -1213,14 +1485,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte) { int parentRTindex = rti; - List *live_childrels = NIL; + List *rel_appinfos = NIL, + *live_childrels = NIL; ListCell *l; + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + { + foreach (l, root->append_rel_list) + { + AppendRelInfo *appinfo = lfirst(l); + + /* append_rel_list contains all append rels; ignore others */ + if (appinfo->parent_relid == parentRTindex) + rel_appinfos = lappend(rel_appinfos, appinfo); + } + } + else + rel_appinfos = rel->painfo->live_partition_appinfos; + /* * Generate access paths for each member relation, and remember the * non-dummy children. */ - foreach(l, root->append_rel_list) + foreach(l, rel_appinfos) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); int childRTindex; @@ -1291,33 +1578,31 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, ListCell *l; List *partitioned_rels = NIL; RangeTblEntry *rte; - bool build_partitioned_rels = false; /* - * A root partition will already have a PartitionedChildRelInfo, and a - * non-root partitioned table doesn't need one, because its Append paths - * will get flattened into the parent anyway. For a subquery RTE, no - * PartitionedChildRelInfo exists; we collect all partitioned_rels - * associated with any child. (This assumes that we don't need to look - * through multiple levels of subquery RTEs; if we ever do, we could - * create a PartitionedChildRelInfo with the accumulated list of - * partitioned_rels which would then be found when populated our parent - * rel with paths. For the present, that appears to be unnecessary.) + * AppendPath we are about to generate must record the RT indexes of + * partitioned tables that are direct or indirect children of this Append + * rel. For partitioned tables, we collect its live partitioned children + * from rel->painfo. However, it will contain only its immediate children, + * so collect live partitioned children from all children that are + * themselves partitioned and concatenate to our list before finally + * passing the list to create_append_path() and/or + * generate_mergeappend_paths(). + * + * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the + * list of live partitioned children, so we must assemble the same in the + * loop below from the children that are known to correspond to + * partitioned rels. (This assumes that we don't need to look through + * multiple levels of subquery RTEs; if we ever do, we could consider + * stuffing the list we generate here into sub-query RTE's RelOptInfo, just + * like we do for partitioned rels, which would be used when populating our + * parent rel with paths. For the present, that appears to be + * unnecessary.) */ rte = planner_rt_fetch(rel->relid, root); - switch (rte->rtekind) - { - case RTE_RELATION: - if (rte->relkind == RELKIND_PARTITIONED_TABLE) - partitioned_rels = - get_partitioned_child_rels(root, rel->relid); - break; - case RTE_SUBQUERY: - build_partitioned_rels = true; - break; - default: - elog(ERROR, "unexpcted rtekind: %d", (int) rte->rtekind); - } + if (rte->rtekind == RTE_RELATION && + rte->relkind == RELKIND_PARTITIONED_TABLE) + partitioned_rels = rel->painfo->live_partitioned_rels; /* * For every non-dummy child, remember the cheapest path. Also, identify @@ -1330,17 +1615,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, ListCell *lcp; /* - * If we need to build partitioned_rels, accumulate the partitioned - * rels for this child. + * Accumulate the live partitioned children of this child, if it's + * itself partitioned rel. */ - if (build_partitioned_rels) - { - List *cprels; - - cprels = get_partitioned_child_rels(root, childrel->relid); + if (childrel->painfo) partitioned_rels = list_concat(partitioned_rels, - list_copy(cprels)); - } + childrel->painfo->live_partitioned_rels); /* * If child has an unparameterized cheapest-total path, add that to diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 454a940a23..4f27d3018e 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -99,6 +99,7 @@ extern int get_partition_for_tuple(PartitionDispatch *pd, EState *estate, PartitionDispatchData **failed_at, TupleTableSlot **failed_slot); + extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc); extern Oid get_default_partition_oid(Oid parentId); extern void update_default_partition_oid(Oid parentId, Oid defaultPartId); @@ -106,4 +107,12 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel, PartitionBoundSpec *new_spec); extern List *get_proposed_default_constraint(List *new_part_constaints); +/* Planner support stuff. */ +extern List *get_partitions_for_keys(Relation rel, + NullTestType *keynullness, + Datum *minkeys, int n_minkeys, bool min_inclusive, + Datum *maxkeys, int n_maxkeys, bool max_inclusive, + int *min_datum_index, int *max_datum_index, + bool *null_partition_chosen, + bool *default_partition_chosen); #endif /* PARTITION_H */ -- 2.11.0
From a5725aca1168b6539ea591a886775a7f3d170e8d Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 11:33:27 +0900 Subject: [PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch} Introduces a notion of PartitionBoundCmpArg, which replaces the set of arguments void *probe and bool probe_is_bound of both partition_bound_cmp and partition_bound_bsearch. It wasn't possible before to specify the number of datums when a non-bound type of probe is passed. Slightly tweaking the existing interface to allow specifying the same seems awkward. So, instead encapsulate that into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp to compare caller-specifed number of datums, instead of key->partnatts datums. --- src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------ 1 file changed, 96 insertions(+), 39 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index ccf8a1fa67..0133748234 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -111,6 +111,30 @@ typedef struct PartitionRangeBound bool lower; /* this is the lower (vs upper) bound */ } PartitionRangeBound; +/* + * PartitionBoundCmpArg - Caller-defined argument to be passed to + * partition_bound_cmp() + * + * The first (fixed) argument involved in a comparison is the user-defined + * partition bound of a given existing partition, while an instance of the + * following struct describes either a new partition bound being compared + * against existing bounds (is_bound is true in that case and either lbound + * or rbound is set), or a new tuple's partition key specified in datums + * (ndatums = number of partition key columns). + */ +typedef struct PartitionBoundCmpArg +{ + bool is_bound; + union + { + PartitionListValue *lbound; + PartitionRangeBound *rbound; + } bound; + + Datum *datums; + int ndatums; +} PartitionBoundCmpArg; + static int32 qsort_partition_list_value_cmp(const void *a, const void *b, void *arg); static int32 qsort_partition_rbound_cmp(const void *a, const void *b, @@ -139,14 +163,15 @@ static int32 partition_rbound_cmp(PartitionKey key, bool lower1, PartitionRangeBound *b2); static int32 partition_rbound_datum_cmp(PartitionKey key, Datum *rb_datums, PartitionRangeDatumKind *rb_kind, - Datum *tuple_datums); + Datum *tuple_datums, int n_tuple_datums); static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, - int offset, void *probe, bool probe_is_bound); + int offset, PartitionBoundCmpArg *arg); static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, - void *probe, bool probe_is_bound, bool *is_equal); + PartitionBoundCmpArg *arg, + bool *is_equal); static void get_partition_dispatch_recurse(Relation rel, Relation parent, List **pds, List **leaf_part_oids); @@ -755,10 +780,16 @@ check_new_partition_bound(char *relname, Relation parent, { int offset; bool equal; - + PartitionListValue lbound; + PartitionBoundCmpArg arg; + + memset(&lbound, 0, sizeof(PartitionListValue)); + lbound.value = val->constvalue; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = true; + arg.bound.lbound = &lbound; offset = partition_bound_bsearch(key, boundinfo, - &val->constvalue, - true, &equal); + &arg, &equal); if (offset >= 0 && equal) { overlap = true; @@ -809,6 +840,7 @@ check_new_partition_bound(char *relname, Relation parent, PartitionBoundInfo boundinfo = partdesc->boundinfo; int offset; bool equal; + PartitionBoundCmpArg arg; Assert(boundinfo && boundinfo->strategy == PARTITION_STRATEGY_RANGE && @@ -830,8 +862,11 @@ check_new_partition_bound(char *relname, Relation parent, * since the index array is initialised with an extra -1 * at the end. */ - offset = partition_bound_bsearch(key, boundinfo, lower, - true, &equal); + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = true; + arg.bound.rbound = lower; + offset = partition_bound_bsearch(key, boundinfo, &arg, + &equal); if (boundinfo->indexes[offset + 1] < 0) { @@ -845,9 +880,9 @@ check_new_partition_bound(char *relname, Relation parent, { int32 cmpval; + arg.bound.rbound = upper; cmpval = partition_bound_cmp(key, boundinfo, - offset + 1, upper, - true); + offset + 1, &arg); if (cmpval < 0) { /* @@ -2380,12 +2415,15 @@ get_partition_for_tuple(PartitionDispatch *pd, { bool equal = false; int cur_offset; + PartitionBoundCmpArg arg; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = false; + arg.datums = values; + arg.ndatums = key->partnatts; cur_offset = partition_bound_bsearch(key, partdesc->boundinfo, - values, - false, - &equal); + &arg, &equal); if (cur_offset >= 0 && equal) cur_index = partdesc->boundinfo->indexes[cur_offset]; } @@ -2397,6 +2435,7 @@ get_partition_for_tuple(PartitionDispatch *pd, range_partkey_has_null = false; int cur_offset; int i; + PartitionBoundCmpArg arg; /* * No range includes NULL, so this will be accepted by the @@ -2427,12 +2466,13 @@ get_partition_for_tuple(PartitionDispatch *pd, if (range_partkey_has_null) break; + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.is_bound = false; + arg.datums = values; + arg.ndatums = key->partnatts; cur_offset = partition_bound_bsearch(key, partdesc->boundinfo, - values, - false, - &equal); - + &arg, &equal); /* * The offset returned is such that the bound at * cur_offset is less than or equal to the tuple value, so @@ -2629,12 +2669,12 @@ partition_rbound_cmp(PartitionKey key, static int32 partition_rbound_datum_cmp(PartitionKey key, Datum *rb_datums, PartitionRangeDatumKind *rb_kind, - Datum *tuple_datums) + Datum *tuple_datums, int n_tuple_datums) { int i; int32 cmpval = -1; - for (i = 0; i < key->partnatts; i++) + for (i = 0; i < n_tuple_datums; i++) { if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE) return -1; @@ -2656,11 +2696,11 @@ partition_rbound_datum_cmp(PartitionKey key, * partition_bound_cmp * * Return whether the bound at offset in boundinfo is <, =, or > the argument - * specified in *probe. + * specified in *arg. */ static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, - int offset, void *probe, bool probe_is_bound) + int offset, PartitionBoundCmpArg *arg) { Datum *bound_datums = boundinfo->datums[offset]; int32 cmpval = -1; @@ -2668,17 +2708,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, switch (key->strategy) { case PARTITION_STRATEGY_LIST: - cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], - key->partcollation[0], - bound_datums[0], - *(Datum *) probe)); - break; + { + Datum listdatum; + + if (arg->is_bound) + listdatum = arg->bound.lbound->value; + else + { + if (arg->ndatums >= 1) + listdatum = arg->datums[0]; + /* + * If no tuple datum to compare with the bound, consider + * the latter to be greater. + */ + else + return 1; + } + + cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], + key->partcollation[0], + bound_datums[0], + listdatum)); + break; + } case PARTITION_STRATEGY_RANGE: { PartitionRangeDatumKind *kind = boundinfo->kind[offset]; - if (probe_is_bound) + if (arg->is_bound) { /* * We need to pass whether the existing bound is a lower @@ -2689,12 +2747,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, cmpval = partition_rbound_cmp(key, bound_datums, kind, lower, - (PartitionRangeBound *) probe); + arg->bound.rbound); } else cmpval = partition_rbound_datum_cmp(key, bound_datums, kind, - (Datum *) probe); + arg->datums, + arg->ndatums); break; } @@ -2708,20 +2767,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, /* * Binary search on a collection of partition bounds. Returns greatest - * bound in array boundinfo->datums which is less than or equal to *probe. - * If all bounds in the array are greater than *probe, -1 is returned. + * bound in array boundinfo->datums which is less than or equal to *arg. + * If all bounds in the array are greater than *arg, -1 is returned. * - * *probe could either be a partition bound or a Datum array representing - * the partition key of a tuple being routed; probe_is_bound tells which. - * We pass that down to the comparison function so that it can interpret the - * contents of *probe accordingly. + * *arg could either be a partition bound or a Datum array representing + * the partition key of a tuple being routed. We simply pass that down to + * partition_bound_cmp where it is interpreted appropriately. * * *is_equal is set to whether the bound at the returned index is equal with - * *probe. + * *arg. */ static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, - void *probe, bool probe_is_bound, bool *is_equal) + PartitionBoundCmpArg *arg, bool *is_equal) { int lo, hi, @@ -2734,8 +2792,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, int32 cmpval; mid = (lo + hi + 1) / 2; - cmpval = partition_bound_cmp(key, boundinfo, mid, probe, - probe_is_bound); + cmpval = partition_bound_cmp(key, boundinfo, mid, arg); if (cmpval <= 0) { lo = mid; -- 2.11.0
From 28778969ecddb2a9c3f31ff2ed119a342e97fb2d Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 22 Aug 2017 13:48:13 +0900 Subject: [PATCH 4/5] WIP: Implement get_partitions_for_keys() Disable constraint exclusion that occurs using internal partition constraints, so that it's apparent what the new partition-pruning code still needs to do to able to create a plan matching the plain the the traditional constraint exclusion based partition-pruning would result in. --- src/backend/catalog/partition.c | 210 ++++++++++++++++++++++++++++++++++- src/backend/optimizer/util/plancat.c | 4 + 2 files changed, 209 insertions(+), 5 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 0133748234..9d4b7c1a7f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1391,8 +1391,6 @@ get_partition_dispatch_recurse(Relation rel, Relation parent, * datum that the query's bounding keys allow to be returned for the query. * Similarly, *max_datum_index. *null_partition_chosen returns whether * the null partition will be scanned. - * - * TODO: Implement. */ List * get_partitions_for_keys(Relation rel, @@ -1403,12 +1401,214 @@ get_partitions_for_keys(Relation rel, bool *null_partition_chosen, bool *default_partition_chosen) { + int i, + minoff, + maxoff; List *result = NIL; - int i; + PartitionKey partkey = RelationGetPartitionKey(rel); PartitionDesc partdesc = RelationGetPartitionDesc(rel); + PartitionBoundCmpArg arg; + bool is_equal, + scan_default = false; + int null_partition_idx = partdesc->boundinfo->null_index; - for (i = 0; i < partdesc->nparts; i++) - result = lappend_int(result, i); + *null_partition_chosen = false; + + /* + * Check if any of the scan keys are null. If so, return the only + * null-accepting partition if partdesc->boundinfo says there is one. + */ + for (i = 0; i < partkey->partnatts; i++) + { + if (keynullness[i] == IS_NULL) + { + if (null_partition_idx >= 0) + { + *null_partition_chosen = true; + result = list_make1_int(null_partition_idx); + } + else + result = NIL; + + return result; + } + } + + /* + * If query provides no quals, don't forget to scan the default partition. + */ + if (n_minkeys == 0 && n_maxkeys == 0) + scan_default = true; + + if (n_minkeys > 0 && partdesc->nparts > 0) + { + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.datums = minkeys; + arg.ndatums = n_minkeys; + minoff = partition_bound_bsearch(partkey, partdesc->boundinfo, + &arg, &is_equal); + + if (is_equal && arg.ndatums < partkey->partnatts) + { + int32 cmpval; + + is_equal = false; + + do + { + if (min_inclusive) + minoff -= 1; + else + minoff += 1; + if (minoff < 0 || + minoff >= partdesc->boundinfo->ndatums) + break; + cmpval = partition_bound_cmp(partkey, partdesc->boundinfo, + minoff, &arg); + } while (cmpval == 0); + } + + /* Interpret the result per partition strategy. */ + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + /* + * Found, but if the query may have asked us to exclude it. + */ + if (is_equal && !min_inclusive) + minoff++; + break; + + case PARTITION_STRATEGY_RANGE: + /* + * Records returned by the query will be > bounds[minoff], + * because min_scankey is >= bounds[minoff], that is, no + * records of the partition at minoff will be returned. Go + * to the next bound. + */ + if (minoff < partdesc->boundinfo->ndatums - 1) + minoff += 1; + + /* + * Make sure to skip a gap. + * Note: There are ndatums + 1 lots in the indexes array. + */ + if (partdesc->boundinfo->indexes[minoff] < 0 && + partdesc->boundinfo->indexes[minoff + 1] >= 0) + minoff += 1; + break; + } + } + else + minoff = 0; + + if (n_maxkeys > 0 && partdesc->nparts > 0) + { + memset(&arg, 0, sizeof(PartitionBoundCmpArg)); + arg.datums = maxkeys; + arg.ndatums = n_maxkeys; + maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo, + &arg, &is_equal); + if (is_equal && arg.ndatums < partkey->partnatts) + { + int32 cmpval; + + is_equal = false; + + do + { + if (max_inclusive) + maxoff += 1; + else + maxoff -= 1; + if (maxoff < 0 || + maxoff >= partdesc->boundinfo->ndatums) + break; + cmpval = partition_bound_cmp(partkey, partdesc->boundinfo, + maxoff, &arg); + } while (cmpval == 0); + + /* Back up if went too far. */ + if (max_inclusive) + maxoff -= 1; + } + + /* Interpret the result per partition strategy. */ + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + /* + * Found, but if the query may have asked us to exclude it. + */ + if (is_equal && !max_inclusive) + maxoff--; + break; + + case PARTITION_STRATEGY_RANGE: + /* + * Because bounds[maxoff] <= max_scankey, we may need to + * to consider the next partition as well, in addition to + * the partition at maxoff and earlier. + */ + if (!is_equal || max_inclusive) + maxoff += 1; + + /* Make sure to skip a gap. */ + if (partdesc->boundinfo->indexes[maxoff] < 0 && maxoff >= 1) + maxoff -= 1; + break; + } + } + else + maxoff = partdesc->boundinfo->ndatums - 1; + + *min_datum_index = minoff; + *max_datum_index = maxoff; + + switch (partkey->strategy) + { + case PARTITION_STRATEGY_LIST: + for (i = minoff; i <= maxoff; i++) + { + int partition_idx = partdesc->boundinfo->indexes[i]; + + /* + * Multiple values may belong to the same partition, so make + * sure we don't add the same partition index again. + */ + result = list_append_unique_int(result, partition_idx); + } + + /* If no bounding keys exist, include the null partition too. */ + if (null_partition_idx >= 0 && + (keynullness[0] == -1 || keynullness[0] != IS_NOT_NULL)) + { + *null_partition_chosen = true; + result = list_append_unique_int(result, null_partition_idx); + } + + break; + + case PARTITION_STRATEGY_RANGE: + for (i = minoff; i <= maxoff; i++) + { + int partition_idx = partdesc->boundinfo->indexes[i]; + + /* + * If a valid partition exists for this range, add its + * index, if not, the default partition (if any) would be + * covering that range, so request to include the same. + */ + if (partition_idx >= 0) + result = lappend_int(result, partition_idx); + else + scan_default = true; + } + break; + } + + if (scan_default && partdesc->boundinfo->default_index >= 0) + result = lappend_int(result, partdesc->boundinfo->default_index); return result; } diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index f7e3a1df5f..26ea2b4162 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1155,7 +1155,9 @@ get_relation_constraints(PlannerInfo *root, Index varno = rel->relid; Relation relation; TupleConstr *constr; +#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING List *pcqual; +#endif /* * We assume the relation has already been safely locked. @@ -1241,6 +1243,7 @@ get_relation_constraints(PlannerInfo *root, } } +#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING /* Append partition predicates, if any */ pcqual = RelationGetPartitionQual(relation); if (pcqual) @@ -1258,6 +1261,7 @@ get_relation_constraints(PlannerInfo *root, result = list_concat(result, pcqual); } +#endif heap_close(relation, NoLock); -- 2.11.0
From a8660afe1d147dcaa5ff46a8ed4faf366242d4d0 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 5 Sep 2017 10:28:23 +0900 Subject: [PATCH 5/5] Add more tests for the new partitioning-related planning code --- src/test/regress/expected/partition.out | 465 ++++++++++++++++++++++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 + src/test/regress/sql/partition.sql | 82 ++++++ 4 files changed, 549 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/partition.out create mode 100644 src/test/regress/sql/partition.sql diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out new file mode 100644 index 0000000000..db92368fc5 --- /dev/null +++ b/src/test/regress/expected/partition.out @@ -0,0 +1,465 @@ +-- +-- Test partitioning planner code +-- +create table lp (a char) partition by list (a); +create table lp_ef partition of lp for values in ('e', 'f'); +create table lp_ad partition of lp for values in ('a', 'd'); +create table lp_bc partition of lp for values in ('b', 'c'); +create table lp_null partition of lp for values in (null); +explain (costs off) select * from lp; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on lp_ad + -> Seq Scan on lp_bc + -> Seq Scan on lp_ef + -> Seq Scan on lp_null +(5 rows) + +explain (costs off) select * from lp where a > 'a' and a < 'd'; + QUERY PLAN +----------------------------------------------------------- + Append + -> Seq Scan on lp_bc + Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar)) +(3 rows) + +explain (costs off) select * from lp where a > 'a' and a <= 'd'; + QUERY PLAN +------------------------------------------------------------ + Append + -> Seq Scan on lp_bc + Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar)) + -> Seq Scan on lp_ad + Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar)) +(5 rows) + +explain (costs off) select * from lp where a = 'a'; + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on lp_ad + Filter: (a = 'a'::bpchar) +(3 rows) + +explain (costs off) select * from lp where 'a' = a; /* commutates */ + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on lp_ad + Filter: ('a'::bpchar = a) +(3 rows) + +explain (costs off) select * from lp where a is not null; + QUERY PLAN +--------------------------------- + Append + -> Seq Scan on lp_ad + Filter: (a IS NOT NULL) + -> Seq Scan on lp_bc + Filter: (a IS NOT NULL) + -> Seq Scan on lp_ef + Filter: (a IS NOT NULL) +(7 rows) + +explain (costs off) select * from lp where a is null; + QUERY PLAN +----------------------------- + Append + -> Seq Scan on lp_null + Filter: (a IS NULL) +(3 rows) + +create table rlp (a int, b varchar) partition by range (a); +create table rlp1 partition of rlp for values from (minvalue) to (1); +create table rlp2 partition of rlp for values from (1) to (10); +create table rlp3 (b varchar, a int) partition by list (b varchar_ops); +create table rlp3abcd partition of rlp3 for values in ('ab', 'cd'); +create table rlp3efgh partition of rlp3 for values in ('ef', 'gh'); +create table rlp3nullxy partition of rlp3 for values in (null, 'xy'); +alter table rlp attach partition rlp3 for values from (15) to (20); +create table rlp4 partition of rlp for values from (20) to (30); +create table rlp5 partition of rlp for values from (31) to (maxvalue); +explain (costs off) select * from rlp where a < 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a < 1) +(3 rows) + +explain (costs off) select * from rlp where 1 > a; /* commutates */ + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp1 + Filter: (1 > a) +(3 rows) + +explain (costs off) select * from rlp where a <= 1; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a <= 1) + -> Seq Scan on rlp2 + Filter: (a <= 1) +(5 rows) + +explain (costs off) select * from rlp where a = 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on rlp2 + Filter: (a = 1) +(3 rows) + +explain (costs off) select * from rlp where a = 1::bigint; /* same as above */ + QUERY PLAN +----------------------------------- + Append + -> Seq Scan on rlp2 + Filter: (a = '1'::bigint) +(3 rows) + +explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */ + QUERY PLAN +----------------------------------------------- + Append + -> Seq Scan on rlp1 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp2 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3abcd + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3efgh + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp3nullxy + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp4 + Filter: ((a)::numeric = '1'::numeric) + -> Seq Scan on rlp5 + Filter: ((a)::numeric = '1'::numeric) +(15 rows) + +explain (costs off) select * from rlp where a <= 10; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a <= 10) + -> Seq Scan on rlp2 + Filter: (a <= 10) +(5 rows) + +explain (costs off) select * from rlp where a > 10; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: (a > 10) + -> Seq Scan on rlp3efgh + Filter: (a > 10) + -> Seq Scan on rlp3nullxy + Filter: (a > 10) + -> Seq Scan on rlp4 + Filter: (a > 10) + -> Seq Scan on rlp5 + Filter: (a > 10) +(11 rows) + +explain (costs off) select * from rlp where a < 15; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp1 + Filter: (a < 15) + -> Seq Scan on rlp2 + Filter: (a < 15) +(5 rows) + +explain (costs off) select * from rlp where a <= 15; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp1 + Filter: (a <= 15) + -> Seq Scan on rlp2 + Filter: (a <= 15) + -> Seq Scan on rlp3abcd + Filter: (a <= 15) + -> Seq Scan on rlp3efgh + Filter: (a <= 15) + -> Seq Scan on rlp3nullxy + Filter: (a <= 15) +(11 rows) + +explain (costs off) select * from rlp where a > 15 and b = 'ab'; + QUERY PLAN +--------------------------------------------------------- + Append + -> Seq Scan on rlp3abcd + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) + -> Seq Scan on rlp4 + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) + -> Seq Scan on rlp5 + Filter: ((a > 15) AND ((b)::text = 'ab'::text)) +(7 rows) + +explain (costs off) select * from rlp where a = 16 and b is null; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on rlp3nullxy + Filter: ((b IS NULL) AND (a = 16)) +(3 rows) + +explain (costs off) select * from rlp where a = 16 and b is not null; + QUERY PLAN +------------------------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: ((b IS NOT NULL) AND (a = 16)) + -> Seq Scan on rlp3efgh + Filter: ((b IS NOT NULL) AND (a = 16)) + -> Seq Scan on rlp3nullxy + Filter: ((b IS NOT NULL) AND (a = 16)) +(7 rows) + +explain (costs off) select * from rlp where a is null; /* while we're on nulls */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from rlp where a > 30; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on rlp5 + Filter: (a > 30) +(3 rows) + +explain (costs off) select * from rlp where a = 30; /* empty */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from rlp where a <= 31; + QUERY PLAN +------------------------------ + Append + -> Seq Scan on rlp1 + Filter: (a <= 31) + -> Seq Scan on rlp2 + Filter: (a <= 31) + -> Seq Scan on rlp3abcd + Filter: (a <= 31) + -> Seq Scan on rlp3efgh + Filter: (a <= 31) + -> Seq Scan on rlp3nullxy + Filter: (a <= 31) + -> Seq Scan on rlp4 + Filter: (a <= 31) + -> Seq Scan on rlp5 + Filter: (a <= 31) +(15 rows) + +-- multi-column keys +create table mc3p (a int, b int, c int) partition by range (a, abs(b), c); +create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1); +create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10); +create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10); +create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20); +create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue); +create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10); +create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20); +create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0); +explain (costs off) select * from mc3p where a = 1; + QUERY PLAN +------------------------- + Append + -> Seq Scan on mc3p0 + Filter: (a = 1) + -> Seq Scan on mc3p1 + Filter: (a = 1) +(5 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) < 1; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a = 1) AND (abs(b) < 1)) +(3 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1; + QUERY PLAN +-------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a = 1) AND (abs(b) = 1)) + -> Seq Scan on mc3p1 + Filter: ((a = 1) AND (abs(b) = 1)) +(5 rows) + +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8; + QUERY PLAN +-------------------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1)) + -> Seq Scan on mc3p1 + Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1)) +(5 rows) + +explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35; + QUERY PLAN +----------------------------------------------------------------- + Append + -> Seq Scan on mc3p1 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p2 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p3 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) + -> Seq Scan on mc3p4 + Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35)) +(9 rows) + +explain (costs off) select * from mc3p where a > 10; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p5 + Filter: (a > 10) + -> Seq Scan on mc3p6 + Filter: (a > 10) + -> Seq Scan on mc3p7 + Filter: (a > 10) +(7 rows) + +explain (costs off) select * from mc3p where a >= 10; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on mc3p1 + Filter: (a >= 10) + -> Seq Scan on mc3p2 + Filter: (a >= 10) + -> Seq Scan on mc3p3 + Filter: (a >= 10) + -> Seq Scan on mc3p4 + Filter: (a >= 10) + -> Seq Scan on mc3p5 + Filter: (a >= 10) + -> Seq Scan on mc3p6 + Filter: (a >= 10) + -> Seq Scan on mc3p7 + Filter: (a >= 10) +(15 rows) + +explain (costs off) select * from mc3p where a < 10; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p0 + Filter: (a < 10) + -> Seq Scan on mc3p1 + Filter: (a < 10) +(5 rows) + +explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10; + QUERY PLAN +----------------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a <= 10) AND (abs(b) < 10)) + -> Seq Scan on mc3p1 + Filter: ((a <= 10) AND (abs(b) < 10)) + -> Seq Scan on mc3p2 + Filter: ((a <= 10) AND (abs(b) < 10)) +(7 rows) + +explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */ + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100; + QUERY PLAN +------------------------------------------------------------ + Append + -> Seq Scan on mc3p6 + Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10)) +(3 rows) + +explain (costs off) select * from mc3p where a > 20; + QUERY PLAN +-------------------------- + Append + -> Seq Scan on mc3p7 + Filter: (a > 20) +(3 rows) + +explain (costs off) select * from mc3p where a >= 20; + QUERY PLAN +--------------------------- + Append + -> Seq Scan on mc3p5 + Filter: (a >= 20) + -> Seq Scan on mc3p6 + Filter: (a >= 20) + -> Seq Scan on mc3p7 + Filter: (a >= 20) +(7 rows) + +-- XXX - '<>' clauses cannot be handled yet +explain (costs off) select * from lp where a <> 'a'; + QUERY PLAN +------------------------------------ + Append + -> Seq Scan on lp_ad + Filter: (a <> 'a'::bpchar) + -> Seq Scan on lp_bc + Filter: (a <> 'a'::bpchar) + -> Seq Scan on lp_ef + Filter: (a <> 'a'::bpchar) +(7 rows) + +-- XXX - redundant clause elimination does not happen yet +explain (costs off) select * from mc3p where a = 10 and a > 1; + QUERY PLAN +---------------------------------------- + Append + -> Seq Scan on mc3p0 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p1 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p2 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p3 + Filter: ((a > 1) AND (a = 10)) + -> Seq Scan on mc3p4 + Filter: ((a > 1) AND (a = 10)) +(11 rows) + +-- XXX - the OR clauses don't contribute to partition-pruning yet +explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef'; + QUERY PLAN +------------------------------------------------------------------------ + Append + -> Seq Scan on rlp3abcd + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) + -> Seq Scan on rlp3efgh + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) + -> Seq Scan on rlp3nullxy + Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text)) +(7 rows) + +drop table lp, rlp, mc3p; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 2fd3f2b1b1..2eb81fcf41 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -60,7 +60,7 @@ test: create_index create_view # ---------- # Another group of parallel tests # ---------- -test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func +test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition # ---------- # sanity_check does a vacuum, affecting the sort order of SELECT * diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule index 76b0de30a7..6611662149 100644 --- a/src/test/regress/serial_schedule +++ b/src/test/regress/serial_schedule @@ -71,6 +71,7 @@ test: create_cast test: constraints test: triggers test: inherit +test: partition test: create_table_like test: typed_table test: vacuum diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql new file mode 100644 index 0000000000..616ad95611 --- /dev/null +++ b/src/test/regress/sql/partition.sql @@ -0,0 +1,82 @@ +-- +-- Test partitioning planner code +-- +create table lp (a char) partition by list (a); +create table lp_ef partition of lp for values in ('e', 'f'); +create table lp_ad partition of lp for values in ('a', 'd'); +create table lp_bc partition of lp for values in ('b', 'c'); +create table lp_null partition of lp for values in (null); +explain (costs off) select * from lp; +explain (costs off) select * from lp where a > 'a' and a < 'd'; +explain (costs off) select * from lp where a > 'a' and a <= 'd'; +explain (costs off) select * from lp where a = 'a'; +explain (costs off) select * from lp where 'a' = a; /* commutates */ +explain (costs off) select * from lp where a is not null; +explain (costs off) select * from lp where a is null; + +create table rlp (a int, b varchar) partition by range (a); +create table rlp1 partition of rlp for values from (minvalue) to (1); +create table rlp2 partition of rlp for values from (1) to (10); + +create table rlp3 (b varchar, a int) partition by list (b varchar_ops); +create table rlp3abcd partition of rlp3 for values in ('ab', 'cd'); +create table rlp3efgh partition of rlp3 for values in ('ef', 'gh'); +create table rlp3nullxy partition of rlp3 for values in (null, 'xy'); +alter table rlp attach partition rlp3 for values from (15) to (20); + +create table rlp4 partition of rlp for values from (20) to (30); +create table rlp5 partition of rlp for values from (31) to (maxvalue); + +explain (costs off) select * from rlp where a < 1; +explain (costs off) select * from rlp where 1 > a; /* commutates */ +explain (costs off) select * from rlp where a <= 1; +explain (costs off) select * from rlp where a = 1; +explain (costs off) select * from rlp where a = 1::bigint; /* same as above */ +explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */ +explain (costs off) select * from rlp where a <= 10; +explain (costs off) select * from rlp where a > 10; +explain (costs off) select * from rlp where a < 15; +explain (costs off) select * from rlp where a <= 15; +explain (costs off) select * from rlp where a > 15 and b = 'ab'; +explain (costs off) select * from rlp where a = 16 and b is null; +explain (costs off) select * from rlp where a = 16 and b is not null; +explain (costs off) select * from rlp where a is null; /* while we're on nulls */ +explain (costs off) select * from rlp where a > 30; +explain (costs off) select * from rlp where a = 30; /* empty */ +explain (costs off) select * from rlp where a <= 31; + +-- multi-column keys +create table mc3p (a int, b int, c int) partition by range (a, abs(b), c); +create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1); +create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10); +create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10); +create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20); +create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue); +create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10); +create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20); +create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0); + +explain (costs off) select * from mc3p where a = 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) < 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1; +explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8; +explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35; +explain (costs off) select * from mc3p where a > 10; +explain (costs off) select * from mc3p where a >= 10; +explain (costs off) select * from mc3p where a < 10; +explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10; +explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */ +explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100; +explain (costs off) select * from mc3p where a > 20; +explain (costs off) select * from mc3p where a >= 20; + +-- XXX - '<>' clauses cannot be handled yet +explain (costs off) select * from lp where a <> 'a'; + +-- XXX - redundant clause elimination does not happen yet +explain (costs off) select * from mc3p where a = 10 and a > 1; + +-- XXX - the OR clauses don't contribute to partition-pruning yet +explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef'; + +drop table lp, rlp, mc3p; -- 2.11.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers