Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE

Melanie Plageman Tue, 20 Nov 2018 12:25:57 -0800

Given that you have addressed all of my feedback and that it's a pretty
low-risk change, I will change the status to "ready for committer".

There are a couple of minor follow-up clarifications inline that relate
mostly
to the questions that I asked in previous emails.

I did have one other question:
Has there been discussion in the past about adding a planner test extension
similar to those in src/test/modules for cardinality estimation? I am
imagining
something that is a "soft" check that either the rows estimation that comes
out
of calc_joinrel_size_estimate is within an expected range (given differing
estimates across machines) or that the selectivity estimate that comes out
of
eqjoinsel is within an expected range. The former seems relatively easy to
do
in a manner similar to the test_predtest extension and the latter seems
like it
could be done even more trivially.

On Sat, Nov 17, 2018 at 12:22 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> (Here, both "outer rel's size" and "inner rel's size" mean the size after
> earlier filtering steps.)  So that's why we only clamp nd2 and only do so
> in eqjoinsel_semi: in the other three cases, we'd be double-counting the
> selectivity of earlier filters if we did that.
>
> I just want to make sure I am understanding what the comment is saying: So,
after we calculate the selectivity for inner join, when we return from
calc_joinrel_size_estimate we do this math:

    nrows = outer_rows * inner_rows * fkselec * jselec;

and in that equation, the outer and inner rows have been adjusted to account
for any restrictions on the tables, so we don't clamp the ndvs for inner
join
in eqjoinsel_inner. However, for semi-join, that calculation is

    nrows = outer_rows * fkselec * jselec;

Which means that we have to adjust the rows of the inner side before we get
here?

> So basically the inconsistency here comes from the fact that we define
> the meaning of join selectivity differently for inner and semi joins.
> I've occasionally wondered if that was a bad choice and we should just
> say that selectivity should always be calculated so that the join size
> is outer size times inner size times selectivity.  But that would
> certainly not make for any less need for the selectivity estimator to
> do things differently for inner and semi joins, so I am not seeing much
> upside to changing it.
>

I see what you are saying. I got tangled up in this part of the code, so I
am
inclined to say that it could stand to be more clear. Selectivity is a
ratio,
and, even if you calculate the two sides of the ratio differently, that
doesn't
mean the definition of the ratio should be different.

Also, I wanted to address a question you asked in an earlier email:
You wrote:
> Hm.  Maybe the "Psemi" and "Pinner" notation is not helpful ... would
> "Ssemi" and "Sinner" be better?

I think Ssemi and Sinner might be more clear--mostly because we haven't used
P/predicate here or in most of the other selectivity estimation comments
that I
read. Also, in some cases when we have super limited information and make a
guess, the selectivity feels pretty detached from the join predicate.

Thanks!

Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE

Reply via email to