Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Tomas Vondra Mon, 26 Mar 2018 12:10:33 -0700

On 03/26/2018 06:21 PM, Dean Rasheed wrote:
> On 26 March 2018 at 14:08, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
>> On 03/26/2018 12:31 PM, Dean Rasheed wrote:
>>> A wider concern I have is that I think this function is trying to be
>>> too clever by only resetting selected stats. IMO it should just reset
>>> all stats unconditionally when the column type changes, which would
>>> be consistent with what we do for regular stats.
>>>
>> The argument a year ago was that it's more plausible that the semantics
>> remains the same. I think the question is how the type change affects
>> precision - had the type change in the opposite direction (int to real)
>> there would be no problem, because both ndistinct and dependencies would
>> produce the same statistics.
>>
>> In my experience people are far more likely to change data types in a
>> way that preserves precision, so I think the current behavior is OK.
> 
> Hmm, I don't really buy that argument. Altering a column's type
> allows the data in it to be rewritten in arbitrary ways, and I don't
> think we should presume that the statistics will still be valid just
> because the user *probably* won't do something that changes the data
> much.
>


Maybe, I can only really speak about my experience, and in those cases
it's usually "the column is an INT and I need a FLOAT". But you're right
it's not guaranteed to be like that, perhaps the right thing to do is
resetting the stats.

Another reason to do that might be consistency - resetting just some of
the stats might be surprising for users. And we're are already resetting
per-column stats on that column, so the users running ANALYZE anyway.

BTW in my response I claimed this:

>
> The other reason is that when reducing precision, it generally
> enforces the dependency (you can't violate functional dependencies or
> break grouping by merging values). So you will have stale stats with
> weaker dependencies, but it's still better than not having any.>

That's actually bogus. For example for functional dependencies, it's
important on which side of the dependency we reduce precision. With
(a->b) dependency, reducing precision of "b" does indeed strengthen it,
but reducing precision of "a" does weaken it. So I take that back.

So, I'm not particularly opposed to just resetting extended stats
referencing the altered column.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Reply via email to