On 03/26/2018 12:31 PM, Dean Rasheed wrote: > On 18 March 2018 at 23:57, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: >> Attached is an updated version of the patch series, addressing issues >> pointed out by Alvaro. > > I'm just starting to look at this now, and I think I'll post > individual comments/questions as I get to them rather than trying to > review the whole thing, because it's quite a large patch. Apologies > if some of this has already been discussed. >
Sure, works for me. And thanks for looking! > Looking at the changes to UpdateStatisticsForTypeChange(): > > + memset(nulls, 1, Natts_pg_statistic_ext * sizeof(bool)); > > why the "1" there -- is it just a typo? > Yeah, that should be 0. It's not causing any issues, because the "replaces" array is initialized to 0 so we're not really using the null value except for individual entries like here: if (statext_is_kind_built(oldtup, STATS_EXT_MCV)) { replaces[Anum_pg_statistic_ext_stxmcv - 1] = true; nulls[Anum_pg_statistic_ext_stxmcv - 1] = true; } but that sets the "nulls" to true anyway. > A wider concern I have is that I think this function is trying to be > too clever by only resetting selected stats. IMO it should just reset > all stats unconditionally when the column type changes, which would > be consistent with what we do for regular stats. > > Consider, for example, what would happen if a column was changed from > real to int -- all the data values will be coerced to integers, > losing precision, and any ndistinct and dependency stats would > likely be completely wrong afterwards. IMO that's a bug, and should > be back-patched independently of these new types of extended stats. > > Thoughts? > The argument a year ago was that it's more plausible that the semantics remains the same. I think the question is how the type change affects precision - had the type change in the opposite direction (int to real) there would be no problem, because both ndistinct and dependencies would produce the same statistics. In my experience people are far more likely to change data types in a way that preserves precision, so I think the current behavior is OK. The other reason is that when reducing precision, it generally enforces the dependency (you can't violate functional dependencies or break grouping by merging values). So you will have stale stats with weaker dependencies, but it's still better than not having any. But that's mostly unrelated to this patch, of course - for MCV lists and histograms we can't keep the stats anyway, because the stats actually do contain the type values (unlike stats introduced in PG10). Actually, to be more accurate - we now store OIDs of the data types in the MCV/histogram stats, so perhaps we could keep those too. But that would be way more work (if at all possible). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services