On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes:
> Also, why the asymmetry in null handling? And why did you only touch > one of the two code paths in eqjoinsel_semi? They have both got this > issue of how to estimate with inadequate stats. This patch was purely a proof of concept, sorry if that wasn't clear. I mostly wanted to point out that real plans regressed with this change. Digging a bit around I could find more examples where it caused real pain. But also cases were the new behaviour was advantageous. Unfortunately the pastebins where raptelan provided plans expired by now... Perhaps he can provide them again? If we aggree on a way to handle the stats I am happy to produce a patch that actually tries to cover all the cases. > > Whats your opinion on this? > Looks pretty bogus to me. You're essentially assuming that the side of > the join without statistics is unique, which is a mighty dubious > assumption. It sure is a bit dubious. But assuming that a semijoin that has max of n rows on the inner side results in half of the outer sides rows (>> n) is pretty bogus as well. Using the asumption of uniqueness for the outer side seems sensible if its only used as a upper limit (Except in an antijoin ...). Yes, my "patch" didn't even start to do this ;) SELECT * FROM blub WHERE foo IN (SELECT something_with_aggregation); is not exactly a fringe case, so I find it problematic regressing quite a bit in the estimates. > (In cases where we *know* it's unique, something like this > could be reasonable, but I believe get_variable_numdistinct already > accounts for such cases.) Only that we infer uniqueness only from very few things unless I miss something... Andres -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs