On 18 March 2018 at 23:57, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > Attached is an updated version of the patch series, addressing issues > pointed out by Alvaro.
I've just been reading the new code in statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and I'm having a hard time convincing myself that it's correct. This code in statext_clauselist_selectivity() looks a bit odd: /* * Evaluate the MCV selectivity. See if we got a full match and the * minimal selectivity. */ if (stat->kind == STATS_EXT_MCV) s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid, jointype, sjinfo, rel, &fullmatch, &mcv_lowsel); /* * If we got a full equality match on the MCV list, we're done (and the * estimate is likely pretty good). */ if (fullmatch && (s1 > 0.0)) return s1; /* * If it's a full match (equalities on all columns) but we haven't found * it in the MCV, then we limit the selectivity by frequency of the last * MCV item. Otherwise reset it to 1.0. */ if (!fullmatch) mcv_lowsel = 1.0; return Min(s1, mcv_lowsel); So if fullmatch is true and s1 is greater than 0, it will return s1. If fullmatch is true and s1 equals 0, it will return Min(s1, mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel will be set to 1 and it will return Min(s1, mcv_lowsel) which will also be s1. So it always just returns s1, no? Maybe there's no point in computing fullmatch. Also, wouldn't mcv_lowsel potentially be a significant overestimate anyway? Perhaps 1 minus the sum of the MCV frequencies might be closer, but even that ought to take into account the number of distinct values remaining, although that information may not always be available. Also, just above that, in statext_clauselist_selectivity(), it computes the list stat_clauses, then doesn't appear to use it anywhere. I think that would have been the appropriate thing to pass to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses into mcv_clauselist_selectivity() will cause it to fail to find any matches and then underestimate. I've also come across a few incorrect/out-of-date comments: /* * mcv_clauselist_selectivity * Return the estimated selectivity of the given clauses using MCV list * statistics, or 1.0 if no useful MCV list statistic exists. */ -- I can't see any code path that returns 1.0 if there are no MCV stats. The last part of that comment is probably more appropriate to statext_clauselist_selectivity() /* * mcv_update_match_bitmap * [snip] * The function returns the number of items currently marked as 'match', and * ... -- it doesn't seem to return the number of items marked as 'match'. Then inside that function, this comment is wrong (copied from the preceding comment): /* AND clauses assume nothing matches, initially */ memset(bool_matches, STATS_MATCH_FULL, sizeof(char) * mcvlist->nitems); Still reading... Regards, Dean