Re: Additional improvements to extended statistics

Dean Rasheed Mon, 09 Mar 2020 01:36:52 -0700

On Mon, 9 Mar 2020 at 00:02, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
>
> Speaking of which, would you take a look at [1]? I think supporting SAOP
> is fine, but I wonder if you agree with my conclusion we can't really
> support inclusion @> as explained in [2].
>


Hmm, I'm not sure. However, thinking about your example in [2] reminds
me of a thought I had a while ago, but then forgot about --- there is
a flaw in the formula used for computing probabilities with functional
dependencies:

  P(a,b) = P(a) * [f + (1-f)*P(b)]

because it might return a value that is larger that P(b), which
obviously should not be possible. We should amend that formula to
prevent a result larger than P(b). The obvious way to do that would be
to use:

  P(a,b) = Min(P(a) * [f + (1-f)*P(b)], P(b))

but actually I think it would be better and more principled to use:

  P(a,b) = f*Min(P(a),P(b)) + (1-f)*P(a)*P(b)

I.e., for those rows believed to be functionally dependent, we use the
minimum probability, and for the rows believed to be independent, we
use the product.

I think that would solve the problem with the example you gave at the
end of [2], but I'm not sure if it helps with the general case.

Regards,
Dean


> [1] 
> https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy@pierred-pdoc
> [2] 
> https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development

Re: Additional improvements to extended statistics

Reply via email to