Hi, So far the extended statistics are applied only at scan level, i.e. when estimating selectivity for individual tables. Which is great, but joins are a known challenge, so let's try doing something about it ...
Konstantin Knizhnik posted a patch [1] using functional dependencies to improve join estimates in January. It's an interesting approach, but as I explained in that thread I think we should try a different approach, similar to how we use MCV lists without extended stats. We'll probably end up considering functional dependencies too, but probably only as a fallback (similar to what we do for single-relation estimates). This is a PoC demonstrating the approach I envisioned. It's incomplete and has various limitations: - no expression support, just plain attribute references - only equality conditions - requires MCV lists on both sides - inner joins only All of this can / should be relaxed later, of course. But for a PoC this seems sufficient. The basic principle is fairly simple, and mimics what eqjoinsel_inner does. Assume we have a query: SELECT * FROM t1 JOIN t2 ON (t1.a = t2.a AND t1.b = t2.b) If we have MCV lists on (t1.a,t1.b) and (t2.a,t2.b) then we can use the same logic as eqjoinsel_inner and "match" them together. If the MCV list is "larger" - e.g. on (a,b,c) - then it's a bit more complicated, but the general idea is the same. To demonstrate this, consider a very simple example with a table that has a lot of dependency between the columns: ================================================================== CREATE TABLE t (a INT, b INT, c INT, d INT); INSERT INTO t SELECT mod(i,100), mod(i,100), mod(i,100), mod(i,100) FROM generate_series(1,100000) s(i); ANALYZE t; SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b); CREATE STATISTICS s (mcv, ndistinct) ON a,b,c,d FROM t; ANALYZE t; SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b); ALTER STATISTICS s SET STATISTICS 10000; ANALYZE t; SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b); ================================================================== The results look like this: - actual rows: 100000000 - estimated (no stats): 1003638 - estimated (stats, 100): 100247844 - estimated (stats, 10k): 100000000 So, in this case the extended stats help quite a bit, even with the default statistics target. However, there are other things we can do. For example, we can use restrictions (at relation level) as "conditions" to filter the MCV lits, and calculate conditional probability. This is useful even if there's just a single join condition (on one column), but there are dependencies between that column and the other filters. Or maybe when there are filters between conditions on the two sides. Consider for example these two queries: SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b) WHERE t1.c < 25 AND t2.d < 25; SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b) WHERE t1.c < 25 AND t2.d > 75; In this particular case we know that (a = b = c = d) so the two filters are somewhat redundant. The regular estimates will ignore that, but with MCV we can actually detect that - when we combine the two MCV lists, we essentially calculate MCV (a,b,t1.c,t2.d), and use that. Q1 Q2 - actual rows: 25000000 0 - estimated (no stats): 62158 60241 - estimated (stats, 100): 25047900 1 - estimated (stats, 10k): 25000000 1 Obviously, the accuracy depends on how representative the MCV list is (what fraction of the data it represents), and in this case it works fairly nicely. A lot of the future work will be about handling cases when it represents only part of the data. The attached PoC patch has a number of FIXME and XXX, describing stuff I ignored to keep it simple, possible future improvement. And so on. regards [1] https://www.postgresql.org/message-id/flat/71d67391-16a9-3e5e-b5e4-8f7fd32cc...@postgrespro.ru -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c index d263ecf082..dca1e7d34e 100644 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -157,6 +157,23 @@ clauselist_selectivity_ext(PlannerInfo *root, &estimatedclauses, false); } + /* + * Try applying extended statistics to joins. There's not much we can + * do to detect when this makes sense, but we can check that there are + * join clauses, and that at least some of the rels have stats. + * + * XXX Isn't this mutualy exclusive with the preceding block which + * calculates estimates for a single relation? + */ + if (use_extended_stats && + statext_try_join_estimates(root, clauses, varRelid, jointype, sjinfo, + estimatedclauses)) + { + s1 *= statext_clauselist_join_selectivity(root, clauses, varRelid, + jointype, sjinfo, + &estimatedclauses); + } + /* * Apply normal selectivity estimates for remaining clauses. We'll be * careful to skip any clauses which were already estimated above. diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index 8c75690fce..fec11ad9b5 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -30,6 +30,7 @@ #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/optimizer.h" +#include "optimizer/pathnode.h" #include "pgstat.h" #include "postmaster/autovacuum.h" #include "statistics/extended_stats_internal.h" @@ -1154,6 +1155,36 @@ has_stats_of_kind(List *stats, char requiredkind) return false; } +/* + * has_matching_mcv + * Check whether the list contains statistic of a given kind + * + * XXX Should consider both attnums and expressions. Also should consider + * restrictinfos as conditions. + */ +StatisticExtInfo * +find_matching_mcv(List *stats, Bitmapset *attnums) +{ + ListCell *l; + StatisticExtInfo *found = NULL; +elog(WARNING, "find_matching_mcv %d", bms_num_members(attnums)); + foreach(l, stats) + { + StatisticExtInfo *stat = (StatisticExtInfo *) lfirst(l); + + if (stat->kind != STATS_EXT_MCV) + continue; + + if (!bms_is_subset(attnums, stat->keys)) + continue; + + if (!found || (bms_num_members(found->keys) > bms_num_members(stat->keys))) + found = stat; + } + + return found; +} + /* * stat_find_expression * Search for an expression in statistics object's list of expressions. @@ -2571,3 +2602,642 @@ make_build_data(Relation rel, StatExtEntry *stat, int numrows, HeapTuple *rows, return result; } + +static bool * +statext_mcv_eval_conditions(PlannerInfo *root, RelOptInfo *rel, + StatisticExtInfo *stat, MCVList *mcv, + Selectivity *sel) +{ + ListCell *lc; + List *conditions = NIL; + + /* extract conditions that may be applied to the MCV list */ + foreach (lc, rel->baserestrictinfo) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); + Bitmapset *attnums = NULL; + List *exprs = NIL; + + /* clause has to be supported by MCV in general */ + if (!statext_is_compatible_clause(root, (Node *) rinfo, rel->relid, + &attnums, &exprs)) + continue; + + /* + * clause has to be covered by the statistics object + * + * FIXME handle expressions properly + */ + if (!bms_is_subset(attnums, stat->keys)) + continue; + + conditions = lappend(conditions, rinfo->clause); + } + + /* everything matches by default */ + *sel = 1.0; + + if (!conditions) + return NULL; + + /* what's the selectivity of the conditions alone? */ + *sel = clauselist_selectivity(root, conditions, rel->relid, 0, NULL); + + return mcv_get_match_bitmap(root, conditions, stat->keys, stat->exprs, + mcv, false); +} + +static double +statext_ndistinct_estimate(PlannerInfo *root, RelOptInfo *rel, List *clauses) +{ + ListCell *lc; + + List *exprs = NIL; + + foreach (lc, clauses) + { + ListCell *lc2; + Node *clause = (Node *) lfirst(lc); + OpExpr *opexpr; + + if (!is_opclause(clause)) + continue; + + opexpr = (OpExpr *) clause; + + if (list_length(opexpr->args) != 2) + continue; + + foreach (lc2, opexpr->args) + { + Node *expr = (Node *) lfirst(lc2); + Bitmapset *varnos = pull_varnos(root, expr); + + if (bms_singleton_member(varnos) == rel->relid) + exprs = lappend(exprs, expr); + } + } + + return estimate_num_groups(root, exprs, rel->rows, NULL, NULL); +} + +/* + * statext_compare_mcvs + * Calculte join selectivity using extended statistics, similarly to + * eqjoinsel_inner. + * + * Considers restrictions on base relations too, essentially computing + * a conditional probability + * + * P(join clauses | baserestrictinfos on either side) + */ +Selectivity +statext_compare_mcvs(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, + StatisticExtInfo *stat1, StatisticExtInfo *stat2, + List *clauses) +{ + MCVList *mcv1; + MCVList *mcv2; + int i, j; + Selectivity s = 0; + + /* items eliminated by conditions (if any) */ + bool *conditions1 = NULL, + *conditions2 = NULL; + + double conditions1_sel = 1.0, + conditions2_sel = 1.0; + + bool *matches1 = NULL, + *matches2 = NULL; + + double matchfreq1, + unmatchfreq1, + matchfreq2, + unmatchfreq2, + otherfreq1, + mcvfreq1, + otherfreq2, + mcvfreq2; + + double nd1, + nd2; + + double totalsel1, + totalsel2; + + mcv1 = statext_mcv_load(stat1->statOid); + mcv2 = statext_mcv_load(stat2->statOid); + + /* should only get here with MCV on both sides */ + Assert(mcv1 && mcv2); + + matches1 = (bool *) palloc0(sizeof(bool) * mcv1->nitems); + matches2 = (bool *) palloc0(sizeof(bool) * mcv2->nitems); + + /* apply baserestrictinfo conditions on the MCV lists */ + + conditions1 = statext_mcv_eval_conditions(root, rel1, stat1, mcv1, + &conditions1_sel); + + conditions2 = statext_mcv_eval_conditions(root, rel2, stat2, mcv2, + &conditions2_sel); + + /* + * Match items from the two MCV lits. + * + * We don't know if the matches are 1:1 - we may have overlap on only + * a subset of attributes, e.g. (a,b,c) vs. (b,c,d), so there may be + * multiple matches. + */ + for (i = 0; i < mcv1->nitems; i++) + { + /* skip items eliminated by restrictions on rel1 */ + if (conditions1 && !conditions1[i]) + continue; + + /* find matches in the second MCV list */ + for (j = 0; j < mcv2->nitems; j++) + { + ListCell *lc; + bool items_match = true; + + /* skip items eliminated by restrictions on rel2 */ + if (conditions2 && !conditions2[j]) + continue; + + foreach (lc, clauses) + { + Node *clause = (Node *) lfirst(lc); + Bitmapset *atts1 = NULL; + Bitmapset *atts2 = NULL; + Datum value1, value2; + int index1, index2; + AttrNumber attnum1; + AttrNumber attnum2; + bool match; + + FmgrInfo opproc; + OpExpr *expr = (OpExpr *) clause; + + Assert(is_opclause(clause)); + + fmgr_info(get_opcode(expr->opno), &opproc); + + /* determine the columns in each statistics object */ + + pull_varattnos(clause, rel1->relid, &atts1); + attnum1 = bms_singleton_member(atts1) + FirstLowInvalidHeapAttributeNumber; + index1 = bms_member_index(stat1->keys, attnum1); + + pull_varattnos(clause, rel2->relid, &atts2); + attnum2 = bms_singleton_member(atts2) + FirstLowInvalidHeapAttributeNumber; + index2 = bms_member_index(stat2->keys, attnum2); + + /* if either value is null, we're done */ + if (mcv1->items[i].isnull[index1] || mcv2->items[j].isnull[index2]) + match = false; + else + { + value1 = mcv1->items[i].values[index1]; + value2 = mcv2->items[j].values[index2]; + + /* + * FIXME Might have issues with order of parameters, but for + * same-type equality that should not matter. + * */ + match = DatumGetBool(FunctionCall2Coll(&opproc, + InvalidOid, + value1, value2)); + } + + items_match &= match; + + if (!items_match) + break; + } + + if (items_match) + { + matches1[i] = matches2[j] = true; + s += mcv1->items[i].frequency * mcv2->items[j].frequency; + + /* XXX Do we need to do something about base frequency? */ + } + } + } + + matchfreq1 = unmatchfreq1 = mcvfreq1 = 0.0; + for (i = 0; i < mcv1->nitems; i++) + { + mcvfreq1 += mcv1->items[i].frequency; + + if (conditions1 && !conditions1[i]) + continue; + + if (matches1[i]) + matchfreq1 += mcv1->items[i].frequency; + else + unmatchfreq1 += mcv1->items[i].frequency; + } + + /* not represented by the MCV */ + otherfreq1 = 1 - mcvfreq1; + + matchfreq2 = unmatchfreq2 = mcvfreq2 = 0.0; + for (i = 0; i < mcv2->nitems; i++) + { + mcvfreq2 += mcv2->items[i].frequency; + + if (conditions2 && !conditions2[i]) + continue; + + if (matches2[i]) + matchfreq2 += mcv2->items[i].frequency; + else + unmatchfreq2 += mcv2->items[i].frequency; + } + + /* not represented by the MCV */ + otherfreq2 = 1 - mcvfreq2; + + /* correction for MCV parts eliminated by the conditions */ + s = s * mcvfreq1 * mcvfreq2 / (matchfreq1 + unmatchfreq1) / (matchfreq2 + unmatchfreq2); + + nd1 = statext_ndistinct_estimate(root, rel1, clauses); + nd2 = statext_ndistinct_estimate(root, rel2, clauses); + + /* + * XXX this is a bit bogus, because we don't know what fraction of + * distinct combinations is covered by the MCV list (we're only + * dealing with some of the columns), so we can't use the same + * formular as eqjoinsel_inner exactly. Moreover, we need to look + * at the conditions. So instead we simply assume the conditions + * affect the distinct groups, and use that. + */ + nd1 *= conditions1_sel; + nd2 *= conditions2_sel; + + totalsel1 = s; + totalsel1 += unmatchfreq1 * otherfreq2 / nd2; + totalsel1 += otherfreq1 * (otherfreq2 + unmatchfreq2) / nd2; + +// if (nd2 > mcvb->nitems) +// totalsel1 += unmatchfreq1 * otherfreq2 / (nd2 - mcvb->nitems); +// if (nd2 > nmatches) +// totalsel1 += otherfreq1 * (otherfreq2 + unmatchfreq2) / +// (nd2 - nmatches); + + totalsel2 = s; + totalsel2 += unmatchfreq2 * otherfreq1 / nd1; + totalsel2 += otherfreq2 * (otherfreq1 + unmatchfreq1) / nd1; + +// if (nd1 > mcva->nitems) +// totalsel2 += unmatchfreq2 * otherfreq1 / (nd1 - mcva->nitems); +// if (nd1 > nmatches) +// totalsel2 += otherfreq2 * (otherfreq1 + unmatchfreq1) / +// (nd1 - nmatches); + + s = Min(totalsel1, totalsel2); + + return s; +} + +static bool +is_supported_join_clause(Node *clause) +{ + Oid oprsel; + RestrictInfo *rinfo; + + /* XXX Not sure we can rely on only getting RestrictInfo here? */ + if (!IsA(clause, RestrictInfo)) + return false; + + /* strip the RestrictInfo */ + rinfo = (RestrictInfo *) clause; + clause = (Node *) rinfo->clause; + + /* skip clauses that don't link two base relations */ + if (bms_num_members(rinfo->clause_relids) != 2) + return false; + + /* we only support simple operator clauses for now */ + if (!is_opclause(clause)) + return false; + + oprsel = get_oprjoin(((OpExpr *) clause)->opno); + + if (oprsel != F_EQJOINSEL) + return false; + + /* + * FIXME More thorought check that it's Var = Var or something like + * that with expressions. Maybe also check that both relations have + * extended statistics (no point in matching without it). + * + * XXX Also check it's not expression on system attributes, which we + * don't allow in extended statistics. + * + * XXX Although maybe we could allow cases that combine expressions + * from both relations on either side? Like (t1.a + t2.b = t1.c - t2.d) + * or something like that. We could do "cartesian product" of the MCV + * stats and restrict it using this condition. + */ + + return true; +} + +/* + * statext_try_join_estimates + * Checks if it's worth considering extended stats on join estimates. + */ +bool +statext_try_join_estimates(PlannerInfo *root, List *clauses, int varRelid, + JoinType jointype, SpecialJoinInfo *sjinfo, + Bitmapset *estimatedclauses) +{ + int listidx; + int k; + ListCell *lc; + Bitmapset *relids = NULL; + + /* XXX see treat_as_join_clause */ + if ((varRelid != 0) || (sjinfo == NULL)) + return false; + + listidx = -1; + foreach (lc, clauses) + { + Node *clause = (Node *) lfirst(lc); + RestrictInfo *rinfo; + listidx++; + + /* skip estimated clauses */ + if (bms_is_member(listidx, estimatedclauses)) + continue; + + /* is_supported_join_clause ensures we have RestrictInfo */ + if (!is_supported_join_clause(clause)) + continue; + + rinfo = (RestrictInfo *) clause; + + relids = bms_union(relids, rinfo->clause_relids); + } + + /* no join clauses found, don't try applying extended stats */ + if (bms_num_members(relids) == 0) + return false; + + /* + * Check that at least some of the rels have extended stats. + * + * XXX Maybe we should check how many rels have stats, and cross-check + * how compatible they are (e.g. that both have MCVs, etc.). Also, + * maybe this should cross-check the exact pairs of rels with a join + * clause between them? + * + * XXX We could also check if there are enough parameters in each rel + * to consider extended stats. If there's just a single attribute, it's + * probably better to use just regular statistics. OTOH we can also + * consider restriction clauses from baserestrictinfo and use them + * to calculate conditional probabilities. + */ + k = -1; + while ((k = bms_next_member(relids, k)) >= 0) + { + RelOptInfo *rel = find_base_rel(root, k); + if (rel->statlist) + return true; + } + + return false; +} + +/* Information about a join between two relations. */ +typedef struct JoinPairInfo +{ + Bitmapset *rels; + List *clauses; +} JoinPairInfo; + +/* + * statext_build_join_pairs + * Extract pairs of joined rels with join clauses for each pair. + */ +static JoinPairInfo * +statext_build_join_pairs(List *clauses, Bitmapset *estimatedclauses, int *npairs) +{ + int cnt; + int listidx; + JoinPairInfo *info; + ListCell *lc; + + /* + * Assume each clause is for a different pair of relations (some of them + * might be already estimated, but meh - there shouldn't be too many of + * them and it's cheaper than repalloc. + */ + info = (JoinPairInfo *) palloc0(sizeof(JoinPairInfo) * list_length(clauses)); + cnt = 0; + + listidx = -1; + foreach(lc, clauses) + { + int i; + bool found; + Node *clause = (Node *) lfirst(lc); + RestrictInfo *rinfo; + + listidx++; + + /* skip already estimated clauses */ + if (bms_is_member(listidx, estimatedclauses)) + continue; + + /* is_supported_join_clause ensures it's a restrict info */ + if (!is_supported_join_clause(clause)) + continue; + + rinfo = (RestrictInfo *) clause; + clause = (Node *) rinfo->clause; + + /* search for a matching join pair */ + found = false; + for (i = 0; i < cnt; i++) + { + if (bms_is_subset(rinfo->clause_relids, info[i].rels)) + { + info[i].clauses = lappend(info[i].clauses, clause); + found = true; + break; + } + } + + if (!found) + { + info[cnt].rels = rinfo->clause_relids; + info[cnt].clauses = lappend(info[cnt].clauses, clause); + cnt++; + } + } + + if (cnt == 0) + return NULL; + + *npairs = cnt; + return info; +} + +static RelOptInfo * +extract_relation_info(PlannerInfo *root, JoinPairInfo *info, int index, + Bitmapset **attnums, StatisticExtInfo **stat); + +/* + * statext_clauselist_join_selectivity + * Use extended stats to estimate join clauses. + * + * XXX In principle, we should not restrict this to cases with multiple + * join clauses - we should consider dependencies with conditions at the + * base relations, i.e. calculate P(join clause | base restrictions). + * But currently that does not happen, because clauselist_selectivity_ext + * treats a single clause as a special case (and we don't apply extended + * statistics in that case yet). + */ +Selectivity +statext_clauselist_join_selectivity(PlannerInfo *root, List *clauses, int varRelid, + JoinType jointype, SpecialJoinInfo *sjinfo, + Bitmapset **estimatedclauses) +{ + int i; + int listidx; + Selectivity s = 1.0; + + JoinPairInfo *info; + int ninfo; + + if (!clauses) + return 1.0; + + /* extract pairs of joined relations from the list of clauses */ + info = statext_build_join_pairs(clauses, *estimatedclauses, &ninfo); +elog(WARNING, "statext_build_join_pairs = %p", info); + /* no useful join pairs */ + if (!info) + return 1.0; + + /* + * Process the join pairs, try to find a matching MCV on each side. + * + * XXX The basic principle is quite similar to eqjoinsel_inner, i.e. + * we try to find a MCV on both sides of the join, and use it to get + * better join estimate. It's a bit more complicated, because there + * might be multiple MCV lists, we also need ndistinct estimate, and + * there may be interesting baserestrictions too. + * + * XXX At the moment we only handle the case with matching MCVs on + * both sides, but it'd be good to also handle case with just ndistinct + * statistics improving ndistinct estimates. + * + * XXX Perhaps it'd be good to also handle case with one side only + * having "regular" statistics (e.g. MCV), especially in cases with + * no conditions on that side of the join (where we can't use the + * extended MCV to calculate conditional probability). + */ + for (i = 0; i < ninfo; i++) + { + RelOptInfo *rel1; + RelOptInfo *rel2; + Bitmapset *attnos1 = NULL; + Bitmapset *attnos2 = NULL; + StatisticExtInfo *stat1; + StatisticExtInfo *stat2; + + ListCell *lc; + + /* extract info about the first relation */ + rel1 = extract_relation_info(root, &info[i], 0, &attnos1, &stat1); + + /* extract info about the second relation */ + rel2 = extract_relation_info(root, &info[i], 1, &attnos2, &stat2); + + /* XXX only handling case with MCV on both sides for now */ + if (!stat1 || !stat2) + continue; + + s *= statext_compare_mcvs(root, rel1, rel2, stat1, stat2, info[i].clauses); + + /* + * Now mark all the clauses for this pair as estimated. + * + * XXX Maybe track the indexes in JoinPairInfo, so that we can + * simply union the two bitmaps, without the extra matching. + */ + foreach (lc, info->clauses) + { + Node *clause = (Node *) lfirst(lc); + ListCell *lc2; + + listidx = -1; + foreach (lc2, clauses) + { + Node *clause2 = (Node *) lfirst(lc2); + listidx++; + + Assert(IsA(clause2, RestrictInfo)); + + clause2 = (Node *) ((RestrictInfo *) clause2)->clause; + + if (equal(clause, clause2)) + { + *estimatedclauses = bms_add_member(*estimatedclauses, listidx); + break; + } + } + } + } + + return s; +} + +static RelOptInfo * +extract_relation_info(PlannerInfo *root, JoinPairInfo *info, int index, + Bitmapset **attnums, StatisticExtInfo **stat) +{ + int k; + int relid; + RelOptInfo *rel; + + k = -1; + while (index >= 0) + { + k = bms_next_member(info->rels, k); + if (k < 0) + elog(ERROR, "failed to extract relid"); + + relid = k; + index--; + } + + rel = find_base_rel(root, relid); + + /* + * extract attnums from the clauses, and remove the offset (we don't + * bother with system attributes) + * + * FIXME This is wrong, we need to match the clauses to both attnums + * and expressions to extended statistics objects. + */ + pull_varattnos((Node *) info->clauses, relid, attnums); + + k = -1; + while ((k = bms_next_member(*attnums, k)) >= 0) + { + *attnums = bms_del_member(*attnums, k); + *attnums = bms_add_member(*attnums, k + FirstLowInvalidHeapAttributeNumber); + } + + *stat = find_matching_mcv(rel->statlist, *attnums); + + return rel; +} diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index 2a00fb4848..5410360653 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -1597,7 +1597,7 @@ mcv_match_expression(Node *expr, Bitmapset *keys, List *exprs, Oid *collid) * & and |, which should be faster than min/max. The bitmaps are fairly * small, though (thanks to the cap on the MCV list size). */ -static bool * +bool * mcv_get_match_bitmap(PlannerInfo *root, List *clauses, Bitmapset *keys, List *exprs, MCVList *mcvlist, bool is_or) diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h index 55cd9252a5..072085365c 100644 --- a/src/include/statistics/extended_stats_internal.h +++ b/src/include/statistics/extended_stats_internal.h @@ -127,4 +127,8 @@ extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root, Selectivity *overlap_basesel, Selectivity *totalsel); +extern bool *mcv_get_match_bitmap(PlannerInfo *root, List *clauses, + Bitmapset *keys, List *exprs, + MCVList *mcvlist, bool is_or); + #endif /* EXTENDED_STATS_INTERNAL_H */ diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h index 326cf26fea..967b2ff0db 100644 --- a/src/include/statistics/statistics.h +++ b/src/include/statistics/statistics.h @@ -120,10 +120,25 @@ extern Selectivity statext_clauselist_selectivity(PlannerInfo *root, Bitmapset **estimatedclauses, bool is_or); extern bool has_stats_of_kind(List *stats, char requiredkind); +extern StatisticExtInfo *find_matching_mcv(List *stats, Bitmapset *attnums); extern StatisticExtInfo *choose_best_statistics(List *stats, char requiredkind, Bitmapset **clause_attnums, List **clause_exprs, int nclauses); extern HeapTuple statext_expressions_load(Oid stxoid, int idx); +extern bool statext_try_join_estimates(PlannerInfo *root, List *clauses, int varRelid, + JoinType jointype, SpecialJoinInfo *sjinfo, + Bitmapset *estimatedclauses); + +extern Selectivity statext_clauselist_join_selectivity(PlannerInfo *root, List *clauses, + int varRelid, + JoinType jointype, SpecialJoinInfo *sjinfo, + Bitmapset **estimatedclauses); + +extern Selectivity statext_compare_mcvs(PlannerInfo *root, + RelOptInfo *rela, RelOptInfo *relb, + StatisticExtInfo *sa, StatisticExtInfo *sb, + List *clauses); + #endif /* STATISTICS_H */