I've started looking over 0002. Here are a few things so far: 1. I think this should be pg_statistic_ext.stxhistogram?
Values of the <type>pg_histogram</type> can be obtained only from the <literal>pg_statistic.stxhistogram</literal> column. 2. I don't think this bms_copy is needed anymore. I think it was previously since there were possibly multiple StatisticExtInfo objects per pg_statistic_ext row, but now it's 1 for 1. + info->keys = bms_copy(keys); naturally, the bms_free() will need to go too. 3. I've not really got into understanding how the new statistics types are applied yet, but I found this: * If asked to build both MCV and histogram, first build the MCV part * and then histogram on the remaining rows. I guess that means we'll get different estimates with: create statistic a_stats (mcv,histogram) on a,b from t; vs create statistic a_stats1 (mcv) on a,b from t; create statistic a_stats2 (histogram) on a,b from t; Is that going to be surprising to people? 4. I guess you can replace "(histogram == NULL);" with "false". The compiler would likely do it anyway, but... if (histogram != NULL) { /* histogram already is a bytea value, not need to serialize */ nulls[Anum_pg_statistic_ext_stxhistogram - 1] = (histogram == NULL); values[Anum_pg_statistic_ext_stxhistogram - 1] = PointerGetDatum(histogram); } but, hmm. Shouldn't you serialize this, like you are with the others? 5. serialize_histogram() and statext_histogram_deserialize(), should these follow the same function naming format? 6. IIRC some compilers may warn about this: if (stat->kinds & requiredkinds) making it: if ((stat->kinds & requiredkinds)) should fix that. UPDATE: Tried to make a few compilers warn about this and failed. Perhaps I've misremembered. 7. Comment claims function has a parameter named 'requiredkind', but it no longer does. The comment also needs updated to mention that it finds statistics with any of the required kinds. * choose_best_statistics * Look for and return statistics with the specified 'requiredkind' which * have keys that match at least two of the given attnums. Return NULL if * there's no match. * * The current selection criteria is very simple - we choose the statistics * object referencing the most of the requested attributes, breaking ties * in favor of objects with fewer keys overall. * * XXX If multiple statistics objects tie on both criteria, then which object * is chosen depends on the order that they appear in the stats list. Perhaps * further tiebreakers are needed. */ StatisticExtInfo * choose_best_statistics(List *stats, Bitmapset *attnums, int requiredkinds) 8. Looking at statext_clauselist_selectivity() I see it calls choose_best_statistics() passing requiredkinds as STATS_EXT_INFO_MCV | STATS_EXT_INFO_HISTOGRAM, do you think the function now needs to attempt to find the best match plus the one with the most statistics kinds? It might only matter if someone had: create statistic a_stats1 (mcv) on a,b from t; create statistic a_stats2 (histogram) on a,b from t; create statistic a_stats3 (mcv,histogram) on a,b from t; Is it fine to just return a_stats1 and ignore the fact that a_stats3 is probably better? Or too corner case to care? 9. examine_equality_clause() assumes it'll get a Var. I see we should only allow clauses that pass statext_is_compatible_clause_internal(), so maybe it's worth an Assert(IsA(var, Var)) along with a comment to mention anything else could not have been allowed. 10. Does examine_equality_clause need 'root' as an argument? 11. UINT16_MAX -> PG_UINT16_MAX /* make sure we fit into uint16 */ Assert(count <= UINT16_MAX); (Out of energy for today.) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services