On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote: > > Looking at the current behaviour, there are a couple of things that > > seem a little odd, even though they are understandable. For example, > > the fact that > > > > CREATE STATISTICS s (expressions) ON (expr), col FROM tbl; > > > > fails, but > > > > CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl; > > > > succeeds and creates both "expressions" and "mcv" statistics. Also, the > > syntax > > > > CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl; > > > > tends to suggest that it's going to create statistics on the pair of > > expressions, describing their correlation, when actually it builds 2 > > independent statistics. Also, this error text isn't entirely accurate: > > > > CREATE STATISTICS s ON col FROM tbl; > > ERROR: extended statistics require at least 2 columns > > > > because extended statistics don't always require 2 columns, they can > > also just have an expression, or multiple expressions and 0 or 1 > > columns. > > > > I think a lot of this stems from treating "expressions" in the same > > way as the other (multi-column) stats kinds, and it might actually be > > neater to have separate documented syntaxes for single- and > > multi-column statistics: > > > > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name > > ON (expression) > > FROM table_name > > > > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name > > [ ( statistics_kind [, ... ] ) ] > > ON { column_name | (expression) } , { column_name | (expression) } [, > > ...] > > FROM table_name > > > > The first syntax would create single-column stats, and wouldn't accept > > a statistics_kind argument, because there is only one kind of > > single-column statistic. Maybe that might change in the future, but if > > so, it's likely that the kinds of single-column stats will be > > different from the kinds of multi-column stats. > > > > In the second syntax, the only accepted kinds would be the current > > multi-column stats kinds (ndistinct, dependencies, and mcv), and it > > would always build stats describing the correlations between the > > columns listed. It would continue to build standard/expression stats > > on any expressions in the list, but that's more of an implementation > > detail. > > > > It would no longer be possible to do "CREATE STATISTICS s > > (expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to > > issue 2 separate "CREATE STATISTICS" commands, but that seems more > > logical, because they're independent stats. > > > > The parsing code might not change much, but some of the errors would > > be different. For example, the errors "building only extended > > expression statistics on simple columns not allowed" and "extended > > expression statistics require at least one expression" would go away, > > and the error "extended statistics require at least 2 columns" might > > become more specific, depending on the stats kind.
This still seems odd: postgres=# CREATE STATISTICS asf ON i FROM t; ERROR: extended statistics require at least 2 columns postgres=# CREATE STATISTICS asf ON (i) FROM t; CREATE STATISTICS It seems wrong that the command works with added parens, but builds expression stats on a simple column (which is redundant with what analyze does without extended stats). -- Justin