Re: [HACKERS] multivariate statistics (v19)

Alvaro Herrera Mon, 06 Feb 2017 14:12:50 -0800

Looking at 0003, I notice that gram.y is changed to add a WITH ( .. )
clause.  If it's not specified, an error is raised.  If you create
stats with (ndistinct) then you can't alter it later to add
"dependencies" or whatever; unless I misunderstand, you have to drop the
statistics and create another one.  Probably in a forthcoming patch we
should have ALTER support to add a stats type.


Also, why isn't the default to build everything, rather than nothing?

BTW, almost everything in the backend could be inside "utils/", so let's
not do that -- let's just create src/backend/statistics/ for all your
code.

Here a few notes while reading README.dependencies -- some typos, two
questions.

diff --git a/src/backend/utils/mvstats/README.dependencies 
b/src/backend/utils/mvstats/README.dependencies
index 908f094..7f3ed3d 100644
--- a/src/backend/utils/mvstats/README.dependencies
+++ b/src/backend/utils/mvstats/README.dependencies
@@ -36,7 +36,7 @@ design choice to model the dataset in denormalized way, 
either because of
 performance or to make querying easier.
 
 
-soft dependencies
+Soft dependencies
 -----------------
 
 Real-world data sets often contain data errors, either because of data entry
@@ -48,7 +48,7 @@ rendering the approach mostly useless even for slightly noisy 
data sets, or
 result in sudden changes in behavior depending on minor differences between
 samples provided to ANALYZE.
 
-For this reason the statistics implementes "soft" functional dependencies,
+For this reason the statistics implements "soft" functional dependencies,
 associating each functional dependency with a degree of validity (a number
 number between 0 and 1). This degree is then used to combine selectivities
 in a smooth manner.
@@ -75,6 +75,7 @@ The algorithm also requires a minimum size of the group to 
consider it
 consistent (currently 3 rows in the sample). Small groups make it less likely
 to break the consistency.
 
+## What is it that we store in the catalog?
 
 Clause reduction (planner/optimizer)
 ------------------------------------
@@ -95,12 +96,12 @@ example for (a,b,c) we first use (a,b=>c) to break the 
computation into
 and then apply (a=>b) the same way on P(a=?,b=?).
 
 
-Consistecy of clauses
+Consistency of clauses
 ---------------------
 
 Functional dependencies only express general dependencies between columns,
 without referencing particular values. This assumes that the equality clauses
-are in fact consistent with the functinal dependency, i.e. that given a
+are in fact consistent with the functional dependency, i.e. that given a
 dependency (a=>b), the value in (b=?) clause is the value determined by (a=?).
 If that's not the case, the clauses are "inconsistent" with the functional
 dependency and the result will be over-estimation.
@@ -111,6 +112,7 @@ set will be empty, but we'll estimate the selectivity using 
the ZIP condition.
 
 In this case the default estimation based on AVIA principle happens to work
 better, but mostly by chance.
+## what is AVIA principle?
 
 This issue is the price for the simplicity of functional dependencies. If the
 application frequently constructs queries with clauses inconsistent with

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

Reply via email to