On Tue, Mar 17, 2020 at 06:05:17PM +0100, Tomas Vondra wrote:
On Tue, Mar 17, 2020 at 04:14:26PM +0000, Dean Rasheed wrote:
On Tue, 17 Mar 2020 at 15:37, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:

On Tue, Mar 17, 2020 at 12:42:52PM +0000, Dean Rasheed wrote:

The other thing that I'm still concerned about is the possibility of
returning estimates with P(a,b) > P(a) or P(b). I think that such a
thing becomes much more likely with the new types of clause supported
here, because they now allow multiple values from each column, where
before we only allowed one. I took another look at the patch I posted
on the other thread, and I've convinced myself that it's correct.
Attached is an updated version, with some cosmetic tidying up and now
with some additional regression tests.

Yeah, I agree that's something we need to fix. Do you plan to push the
fix, or do you want me to do it?


I can push it. Have you had a chance to review it?


Not yet, I'll take a look today.


OK, I took a look. I think from the correctness POV the patch is OK, but
I think the dependencies_clauselist_selectivity() function now got a bit
too complex. I've been able to parse it now, but I'm sure I'll have
trouble in the future :-(

Can we refactor / split it somehow and move bits of the logic to smaller
functions, or something like that?

Another thing I'd like to suggest is keeping the "old" formula, and
instead of just replacing it with

   P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

but explaining how the old formula may produce nonsensical selectivity,
and how the new formula addresses that issue.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to