Looking at 0003, I notice that gram.y is changed to add a WITH ( .. ) clause. If it's not specified, an error is raised. If you create stats with (ndistinct) then you can't alter it later to add "dependencies" or whatever; unless I misunderstand, you have to drop the statistics and create another one. Probably in a forthcoming patch we should have ALTER support to add a stats type.
Also, why isn't the default to build everything, rather than nothing? BTW, almost everything in the backend could be inside "utils/", so let's not do that -- let's just create src/backend/statistics/ for all your code. Here a few notes while reading README.dependencies -- some typos, two questions. diff --git a/src/backend/utils/mvstats/README.dependencies b/src/backend/utils/mvstats/README.dependencies index 908f094..7f3ed3d 100644 --- a/src/backend/utils/mvstats/README.dependencies +++ b/src/backend/utils/mvstats/README.dependencies @@ -36,7 +36,7 @@ design choice to model the dataset in denormalized way, either because of performance or to make querying easier. -soft dependencies +Soft dependencies ----------------- Real-world data sets often contain data errors, either because of data entry @@ -48,7 +48,7 @@ rendering the approach mostly useless even for slightly noisy data sets, or result in sudden changes in behavior depending on minor differences between samples provided to ANALYZE. -For this reason the statistics implementes "soft" functional dependencies, +For this reason the statistics implements "soft" functional dependencies, associating each functional dependency with a degree of validity (a number number between 0 and 1). This degree is then used to combine selectivities in a smooth manner. @@ -75,6 +75,7 @@ The algorithm also requires a minimum size of the group to consider it consistent (currently 3 rows in the sample). Small groups make it less likely to break the consistency. +## What is it that we store in the catalog? Clause reduction (planner/optimizer) ------------------------------------ @@ -95,12 +96,12 @@ example for (a,b,c) we first use (a,b=>c) to break the computation into and then apply (a=>b) the same way on P(a=?,b=?). -Consistecy of clauses +Consistency of clauses --------------------- Functional dependencies only express general dependencies between columns, without referencing particular values. This assumes that the equality clauses -are in fact consistent with the functinal dependency, i.e. that given a +are in fact consistent with the functional dependency, i.e. that given a dependency (a=>b), the value in (b=?) clause is the value determined by (a=?). If that's not the case, the clauses are "inconsistent" with the functional dependency and the result will be over-estimation. @@ -111,6 +112,7 @@ set will be empty, but we'll estimate the selectivity using the ZIP condition. In this case the default estimation based on AVIA principle happens to work better, but mostly by chance. +## what is AVIA principle? This issue is the price for the simplicity of functional dependencies. If the application frequently constructs queries with clauses inconsistent with -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers