Ășt 14. 1. 2020 v 0:00 odesĂlatel Tomas Vondra <tomas.von...@2ndquadrant.com> napsal:
> Hi, > > Now that I've committed [1] which allows us to use multiple extended > statistics per table, I'd like to start a thread discussing a couple of > additional improvements for extended statistics. I've considered > starting a separate patch for each, but that would be messy as those > changes will touch roughly the same places. So I've organized it into a > single patch series, with the simpler parts at the beginning. > > There are three main improvements: > > 1) improve estimates of OR clauses > > Until now, OR clauses pretty much ignored extended statistics, based on > the experience that they're less vulnerable to misestimates. But it's a > bit weird that AND clauses are handled while OR clauses are not, so this > extends the logic to OR clauses. > > Status: I think this is fairly OK. > > > 2) support estimating clauses (Var op Var) > > Currently, we only support clauses with a single Var, i.e. clauses like > > - Var op Const > - Var IS [NOT] NULL > - [NOT] Var > - ... > > and AND/OR clauses built from those simple ones. This patch adds support > for clauses of the form (Var op Var), of course assuming both Vars come > from the same relation. > > Status: This works, but it feels a bit hackish. Needs more work. > > > 3) support extended statistics on expressions > > Currently we only allow simple references to columns in extended stats, > so we can do > > CREATE STATISTICS s ON a, b, c FROM t; > > but not > > CREATE STATISTICS s ON (a+b), (c + 1) FROM t; > +1 for expression's statisctics - it can be great feature. Pavel > This patch aims to allow this. At the moment it's a WIP - it does most > of the catalog changes and stats building, but with some hacks/bugs. And > it does not even try to use those statistics during estimation. > > The first question is how to extend the current pg_statistic_ext catalog > to support expressions. I've been planning to do it the way we support > expressions for indexes, i.e. have two catalog fields - one for keys, > one for expressions. > > One difference is that for statistics we don't care about order of the > keys, so that we don't need to bother with storing 0 keys in place for > expressions - we can simply assume keys are first, then expressions. > > And this is what the patch does now. > > I'm however wondering whether to keep this split - why not to just treat > everything as expressions, and be done with it? A key just represents a > Var expression, after all. And it would massively simplify a lot of code > that now has to care about both keys and expressions. > > Of course, expressions are a bit more expensive, but I wonder how > noticeable that would be. > > Opinions? > > > ragards > > [1] https://commitfest.postgresql.org/26/2320/ > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >