On Thu, 28 Feb 2019 at 19:56, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > Attached is an updated version of this patch series.
Here are some random review comments. I'll add more later, but I'm out of energy for today. 1). src/test/regress/expected/type_sanity.out has bit-rotted. 2). Duplicate OIDs (3425). 3). It looks a bit odd that clauselist_selectivity() calls statext_clauselist_selectivity(), which does MCV stats and will do histograms, but it doesn't do dependencies, so clauselist_selectivity() has to then separately call dependencies_clauselist_selectivity(). It would seem neater if statext_clauselist_selectivity() took care of calling dependencies_clauselist_selectivity(), since dependencies are just another kind of extended stats. 4). There are no tests for pg_mcv_list_items(). Given a table with a small enough amount of data, so that it's all sampled, it ought to be possible to get predictable MCV stats. 5). It's not obvious what some of the new test cases in the "stats_ext" tests are intended to show. For example, the first test creates a table with 5000 rows and a couple of indexes, does a couple of queries, builds some MCV stats, and then repeats the queries, but the results seem to be the same with and without the stats. I wonder if it's possible to write smaller, more targeted tests. Currently "stats_ext" is by far the slowest test in its group, and I'm not sure that some of those tests add much. It ought to be possible to write a function that calls EXPLAIN and returns a query's row estimate, and then you could write tests to confirm the effect of the new stats by verifying the row estimates change as expected. 6). This enum isn't needed for MCVs: /* * Degree of how much MCV item matches a clause. * This is then considered when computing the selectivity. */ #define STATS_MATCH_NONE 0 /* no match at all */ #define STATS_MATCH_PARTIAL 1 /* partial match */ #define STATS_MATCH_FULL 2 /* full match */ STATS_MATCH_PARTIAL is never used for MCVs, so you may as well just use booleans instead of this enum. If those are needed for histograms, they can probably be made local to histogram.c. 7). estimate_num_groups_simple() isn't needed in this patch. 8). In README.mcv, s/clauselist_mv_selectivity_mcvlist/mcv_clauselist_selectivity/. 9). In the list of supported clause types that follows (e) is the same a (c), but with a more general description. 10). It looks like most of the subsequent description of the algorithm is out of date and needs rewriting. All the stuff about full matches and the use of ndistinct is now obsolete. Regards, Dean