On Fri, 18 Jan 2019 at 10:03, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > thanks for the review. The attached patches address most of the issues > mentioned in the past several messages, both in the MCV and histogram parts.
I made another pass over the 0001 patch. I've not read through mcv.c again yet. Will try to get to that soon. 0001-multivariate-MCV-lists-20190117.patch 1. The following mentions "multiple functions", but lists just 1 function. <para> To inspect statistics defined using <command>CREATE STATISTICS</command> command, <productname>PostgreSQL</productname> provides multiple functions. </para> 2. There's a mix of usages of <literal>MCV</literal> and <acronym>MCV</acronym> around the docs. Should these be the same? 3. analyze_mcv_list() is modified to make it an external function, but it's not used anywhere out of analyze.c 4. The following can be simplified further: * We can also leave the record as it is if there are no statistics * including the datum values, like for example MCV lists. */ if (statext_is_kind_built(oldtup, STATS_EXT_MCV)) reset_stats = true; /* * If we can leave the statistics as it is, just do minimal cleanup * and we're done. */ if (!reset_stats) { ReleaseSysCache(oldtup); return; } to just: /* * When none of the defined statistics types contain datum values * from the table's columns then there's no need to reset the stats. * Functional dependencies and ndistinct stats should still hold true. */ if (!statext_is_kind_built(oldtup, STATS_EXT_MCV)) { ReleaseSysCache(oldtup); return; } 5. "so that we can ignore them below." seems misplaced now since you've moved all the code below into clauselist_selectivity_simple(). Maybe you can change it to "so that we can inform clauselist_selectivity_simple about clauses that it should ignore" ? * filled with the 0-based list positions of clauses used that way, so * that we can ignore them below. 6. README.mcv: multi-variate -> multivariate are large the list may be quite large. This is especially true for multi-variate 7. README.mcv: similar -> a similar it impossible to use anyarrays. It might be possible to produce similar 8. I saw you added IS NOT NULL to README.mcv, but README just mentions: (b) MCV lists - equality and inequality clauses (AND, OR, NOT), IS NULL Should that mention IS NOT NULL too? 9. The header comment for build_attnums_array() claims that it "transforms an array of AttrNumber values into a bitmap", but it does the opposite. * Transforms an array of AttrNumber values into a bitmap. 10. The following Assert is not entirely useless. The bitmapset could have a 0 member, but it can't store negative values. while ((j = bms_next_member(attrs, j)) >= 0) { /* user-defined attributes only */ Assert(AttrNumberIsForUserDefinedAttr(j)); Just checking you thought of that when you added it? 11. XXX comments are normally reserved for things we may wish to reconsider later, but the following seems more like a "Note:" * XXX All the memory is allocated in a single chunk, so that the caller * can simply pfree the return value to release all of it. 12. In statext_is_compatible_clause_internal() there's still a comment that mentions "Var op Const", but Const op Var is also okay too. 13. This is not fall-through. Generally, such a comment is reserved to confirm that the "break;" is meant to be missing. default: /* fall-through */ return false; https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/ mentions various comment patterns that are used for that case. Your case seems misplaced since it's right about a return, and not another case. 14. The header comment for statext_is_compatible_clause() is not accurate. It mentions only opexprs with equality operations are allowed, but none of those are true. * Only OpExprs with two arguments using an equality operator are supported. * When returning True attnum is set to the attribute number of the Var within * the supported clause. 15. statext_clauselist_selectivity(): "a number" -> "the number" ? * Selects the best extended (multi-column) statistic on a table (measured by * a number of attributes extracted from the clauses and covered by it), and 16. I understand you're changing this to a bitmask in the 0002 patch, but int is the wrong type here; /* we're interested in MCV lists */ int types = STATS_EXT_MCV; Maybe just pass the STATS_EXT_MCV directly, or at least make it a char. 17. bms_membership(clauses_attnums) != BMS_MULTIPLE seems better here. It can stop once it finds 2. No need to count them all. /* We need at least two attributes for MCV lists. */ if (bms_num_members(clauses_attnums) < 2) return 1.0; 18. The following comment in statext_is_compatible_clause_internal() does not seem to be true. I see OpExprs are supported and NULL test, including others too. /* We only support plain Vars for now */ 19. The header comment for clauselist_selectivity_simple() does not mention what estimatedclauses is for. 20. New line. Also, missing "the" before "maximum" + * We + * iteratively search for multivariate n-distinct with maximum number 21. This comment seems like it's been copied from estimate_num_groups() without being edited. /* we're done with this relation */ varinfos = NIL; Looks like it's using this to break out of the loop. 22. I don't see any dividing going on below this comment: /* * Sanity check --- don't divide by zero if empty relation. */ 23. I see a few tests mentioning: "-- check change of unrelated column type does not reset the MCV statistics" Would it be better to just look at pg_statistic_ext there and do something like: SELECT COUNT(*) FROM pg_statistic_ext WHERE stxname = 'whatever' AND stxmcv IS NOT NULL; Otherwise, you seem to be ensuring the stats were not reset by looking at a query plan, so it's a bit harder to follow and likely testing more than it needs to. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services