On Thu, 11 Apr 2019 at 11:53, Paul Martinez <hello...@gmail.com> wrote: > > I have some questions about the different types of extended statistics > that were introduced in Postgres 10. > - Which types of queries are each statistic type supposed to improve?
Multivariate ndistinct stats are aimed to improve distinct estimates over groups of columns. These can help in cases like GROUP BY a,b, SELECT DISTINCT a,b, SELECT a,b FROM x UNION SELECT a,b FROM y; They also help in determining the number of times an index will be rescanned in cases like nested loops with a parameterised inner path. I see multivariate ndistinct estimates are not used for normal selectivity estimates for unknown values. e.g PREPARE q1 (int, int) AS SELECT * FROM t1 WHERE a = $1 and b = $2; still assumes a and b are independent even when ndistinct stats exist on the two columns. There are a few other usages too. See calls of estimate_num_groups() dependency stats just handle WHERE clauses (or more accurately, clauses containing a reference to a single relation. These only handle equality OpExprs. e.g "a = 10 and y = 3", not "a < 6 and y = 3". Further stat types (most common values) added in PG12 aim to allow inequality operators too. > - When should one type of statistic be used over the other? Should they > both always be used? If they both always should be always used then we'd likely not have bothered making the types optional. Both ndistinct and dependency stats are fairly cheap to calculate and store, so it might not be too big an issue adding both types if you're not sure. With these two types there's not really any choice for the planner to decide to use one or the other, it just makes use of the ones it can use for the given situation. That won't be the case as more stats types get added. In PG12, for example, we had to choose of MCV stats should be applied before dependencies stats. That might be a no-brainer, but perhaps the future there will be stats types where the order to apply them is not so clear, although in those cases it might be questionable why you'd want to define more than one type of stats on the same set of columns. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services