Somebody wrote to me a few days ago that the BRIN cost estimation is rather poor. One immediately obvious issue which I think is easily fixed is the correlation estimate, which is currently hardcoded to 1.
Since a BRIN scan always entails a scan of the relation in physical order, it's simple to produce a better estimate for that, per the attached patch. (Note: trying to run this on expression indexes will cause a crash. I need to complete that part ...) There are other improvements we've been discussing, but I'm unclear that I will have time to study them to get a useful patch quickly enough. If somebody else wants to mess with this, please get in touch. Thoughts? -- Álvaro Herrera 33.5S 70.5W
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 37fad86..5506f67 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -7531,6 +7531,7 @@ brincostestimate(PG_FUNCTION_ARGS) Cost spc_random_page_cost; double qual_op_cost; double qual_arg_cost; + VariableStatData vardata; /* Do preliminary analysis of indexquals */ qinfos = deconstruct_indexquals(path); @@ -7544,6 +7545,8 @@ brincostestimate(PG_FUNCTION_ARGS) * BRIN indexes are always read in full; use that as startup cost. * * XXX maybe only include revmap pages here? + * XXX we should consider the revmap at seqpage cost, and regular pages + * at random page cost. */ *indexStartupCost = spc_seq_page_cost * numPages * loop_count; @@ -7558,7 +7561,88 @@ brincostestimate(PG_FUNCTION_ARGS) clauselist_selectivity(root, indexQuals, path->indexinfo->rel->relid, JOIN_INNER, NULL); - *indexCorrelation = 1; + + /* + * Compute index correlation. + * + * Because we can use all index quals equally when scanning, we can use the + * largest correlation (in absolute value) among columns used by the query. + * Start at zero, the worst possible case. + */ + *indexCorrelation = 0; + + { + RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root); + Oid relid; + ListCell *cell; + + Assert(rte->rtekind == RTE_RELATION); + relid = rte->relid; + Assert(relid != InvalidOid); + + foreach (cell, qinfos) + { + IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell); + AttrNumber colnum = index->indexkeys[qinfo->indexcol]; + + if (colnum != 0) + { + /* Simple variable -- look to stats for the underlying table */ + if (get_relation_stats_hook && + (*get_relation_stats_hook) (root, rte, colnum, &vardata)) + { + /* + * The hook took control of acquiring a stats tuple. If it did + * supply a tuple, it'd better have supplied a freefunc. + */ + if (HeapTupleIsValid(vardata.statsTuple) && + !vardata.freefunc) + elog(ERROR, "no function provided to release variable stats with"); + } + else + { + vardata.statsTuple = SearchSysCache3(STATRELATTINH, + ObjectIdGetDatum(relid), + Int16GetDatum(colnum), + /* XXX no inh */ + BoolGetDatum(false)); + vardata.freefunc = ReleaseSysCache; + } + + if (HeapTupleIsValid(vardata.statsTuple)) + { + float4 *numbers; + int nnumbers; + + /* XXX is InvalidOID reqop fine?? */ + if (get_attstatsslot(vardata.statsTuple, InvalidOid, 0, + STATISTIC_KIND_CORRELATION, + InvalidOid, + NULL, + NULL, NULL, + &numbers, &nnumbers)) + { + double varCorrelation; + + Assert(nnumbers == 1); + varCorrelation = numbers[0]; + + if (Abs(varCorrelation) > Abs(*indexCorrelation)) + *indexCorrelation = varCorrelation; + + free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers); + } + } + } + else + { + /* Expression --- maybe there are stats for the index itself */ + Assert(false); + } + + ReleaseVariableStats(vardata); + } + } /* * Add on index qual eval costs, much as in genericcostestimate.
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers