On Thu, Jan 5, 2017 at 3:27 AM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > Thanks. Those plans match my experiments with the TPC-H data set, although > I've been playing with the smallest scale (1GB). > > It's not very difficult to make the estimation error arbitrary large, e.g. > by using perfectly correlated (identical) columns.
I have done an initial review for ndistint and histogram patches, there are few review comments. ndistinct --------- 1. Duplicate statistics: postgres=# create statistics s with (ndistinct) on (a,c) from t; 2017-01-07 16:21:54.575 IST [63817] ERROR: duplicate key value violates unique constraint "pg_mv_statistic_name_index" 2017-01-07 16:21:54.575 IST [63817] DETAIL: Key (staname, stanamespace)=(s, 2200) already exists. 2017-01-07 16:21:54.575 IST [63817] STATEMENT: create statistics s with (ndistinct) on (a,c) from t; ERROR: duplicate key value violates unique constraint "pg_mv_statistic_name_index" DETAIL: Key (staname, stanamespace)=(s, 2200) already exists. For duplicate statistics, I think we can check the existence of the statistics and give more meaningful error code something statistics "s" already exist. 2. Typo + /* + * Sort the attnums, which makes detecting duplicies somewhat + * easier, and it does not hurt (it does not affect the efficiency, + * onlike for indexes, for example). + */ /onlike/unlike 3. Typo /* * Find attnims of MV stats using the mvoid. */ int2vector * find_mv_attnums(Oid mvoid, Oid *relid) /attnims/attnums histograms -------------- + if (matches[i] == MVSTATS_MATCH_FULL) + s += mvhist->buckets[i]->ntuples; + else if (matches[i] == MVSTATS_MATCH_PARTIAL) + s += 0.5 * mvhist->buckets[i]->ntuples; Isn't it will be better that take some percentage of the bucket based on the number of distinct element for partial matching buckets. +static int +update_match_bitmap_histogram(PlannerInfo *root, List *clauses, + int2vector *stakeys, + MVSerializedHistogram mvhist, + int nmatches, char *matches, + bool is_or) +{ + int i; For each clause we are processing all the buckets, can't we use some data structure which can make multi-dimensions information searching faster. Something like HTree, RTree, Maybe storing histogram in these formats will be difficult? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers