Given that you have addressed all of my feedback and that it's a pretty low-risk change, I will change the status to "ready for committer".
There are a couple of minor follow-up clarifications inline that relate mostly to the questions that I asked in previous emails. I did have one other question: Has there been discussion in the past about adding a planner test extension similar to those in src/test/modules for cardinality estimation? I am imagining something that is a "soft" check that either the rows estimation that comes out of calc_joinrel_size_estimate is within an expected range (given differing estimates across machines) or that the selectivity estimate that comes out of eqjoinsel is within an expected range. The former seems relatively easy to do in a manner similar to the test_predtest extension and the latter seems like it could be done even more trivially. On Sat, Nov 17, 2018 at 12:22 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > (Here, both "outer rel's size" and "inner rel's size" mean the size after > earlier filtering steps.) So that's why we only clamp nd2 and only do so > in eqjoinsel_semi: in the other three cases, we'd be double-counting the > selectivity of earlier filters if we did that. > > I just want to make sure I am understanding what the comment is saying: So, after we calculate the selectivity for inner join, when we return from calc_joinrel_size_estimate we do this math: nrows = outer_rows * inner_rows * fkselec * jselec; and in that equation, the outer and inner rows have been adjusted to account for any restrictions on the tables, so we don't clamp the ndvs for inner join in eqjoinsel_inner. However, for semi-join, that calculation is nrows = outer_rows * fkselec * jselec; Which means that we have to adjust the rows of the inner side before we get here? > So basically the inconsistency here comes from the fact that we define > the meaning of join selectivity differently for inner and semi joins. > I've occasionally wondered if that was a bad choice and we should just > say that selectivity should always be calculated so that the join size > is outer size times inner size times selectivity. But that would > certainly not make for any less need for the selectivity estimator to > do things differently for inner and semi joins, so I am not seeing much > upside to changing it. > I see what you are saying. I got tangled up in this part of the code, so I am inclined to say that it could stand to be more clear. Selectivity is a ratio, and, even if you calculate the two sides of the ratio differently, that doesn't mean the definition of the ratio should be different. Also, I wanted to address a question you asked in an earlier email: You wrote: > Hm. Maybe the "Psemi" and "Pinner" notation is not helpful ... would > "Ssemi" and "Sinner" be better? I think Ssemi and Sinner might be more clear--mostly because we haven't used P/predicate here or in most of the other selectivity estimation comments that I read. Also, in some cases when we have super limited information and make a guess, the selectivity feels pretty detached from the join predicate. Thanks!