Hi, Attached is an updated version of this patch series. I've decided to rebase and send both parts (MCV and histograms), although we've agreed to focus on the MCV part for now. I don't want to leave the histogram to lag behind, because (a) then it'd be much more work to update it, and (b) I think it's an useful feedback about likely future changes.
This should address most of the issues pointed out by David in his recent reviews. Briefly: 1) It fixes/updates a number of comments and docs on various places, removes redundant comments etc. In most cases I've simply adopted the wording proposed by David, with minor tweaks in a couple of cases. 2) Reverts changes that exposed analyze_mcv_list - this was a leftover from the attempt to reuse the single-column algorithm, but we've since agreed it's not the right approach. So this change is unnecessary. 3) I've tweaked the code to accept RelabelType nodes as supported, similarly to what examine_variable() does. Previously I concluded we can't support RelabelType, but it seems that reasoning was bogus. I've slightly tweaked the regression tests by changing one of the columns to varchar, so that the queries actualy trigger this. 4) I've tweaked a couple of places (UpdateStatisticsForTypeChange, statext_clauselist_selectivity and estimate_num_groups_simple) per David's suggestions. Those were fairly straightforward simplifications. 5) I've removed mcv_count from statext_mcv_build(). As David pointed out, this was not actually needed - it was another remnant of the attempt to re-use analyze_mcv_list() which needs such array. But without it we can access the groups directly. 6) One of the review questions was about the purpose of this code: for (i = 0; i < nitems; i++) { if (groups[i].count < mincount) { nitems = i; break; } } It's quite simple - we want to include groups with more occurrences than mincount, and the groups are sorted by the count (in descending order). So we simply find the first group with count below mincount, and the index is the number of groups to keep. I've tried to explain that in a comment. 7) I've fixed a bunch of format patters in statext_mcv_deserialize(), particularly those that confused %d and %u. We can't however use %d for VARSIZE_ANY_EXHDR, because that macro expands into offsetof() etc. So that would trigger compiler warnings. 8) Yeah, pg_stats_ext_mcvlist_items was broken. The issue was that one of the output parameters is defined as boolean[], but the function was building just string. Originally it used BuildTupleFromCStrings(), but then it switched to heap_form_tuple() without building a valid array. I've decided to simply revert back to BuildTupleFromCStrings(). It's not going to be used very frequently, so the small performance difference is not important. I've also fixed the formatting issues, pointed out by David. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
0001-multivariate-MCV-lists-20190228.patch.gz
Description: application/gzip
0002-multivariate-histograms-20190228.patch.gz
Description: application/gzip