Andres Freund <and...@anarazel.de> writes: > On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote: >> Looks pretty bogus to me. You're essentially assuming that the side of >> the join without statistics is unique, which is a mighty dubious >> assumption.
> It sure is a bit dubious. But assuming that a semijoin that has max of n rows > on the inner side results in half of the outer sides rows (>> n) is pretty > bogus as well. How so? There is no reason to think that the number of LHS rows with a match is limited by the number of RHS rows. If we knew that the LHS join key was unique, then yes that'd be sensible, but we don't know that. > SELECT * FROM blub WHERE foo IN (SELECT something_with_aggregation); > is not exactly a fringe case, so I find it problematic regressing > quite a bit in the estimates. Agreed, and that's why I don't want to put in a patch that carries the risk of regressing even more. I'm happy to do something that's got some amount of theory behind it, but if we're just guessing, we can't afford to guess a very high or low selectivity. One thing I've considered but not done anything about is that in a lot of practical cases for this, the aggregation or grouping properties of the sub-select would provide adequate reason for assuming its output is more or less unique, so that taking ndistinct equal to number of rows actually is sane. But it would need a bit of thought about what properties we want to treat as justifying such an assumption, and then some code to see if the join key is a Var coming out of such a sub-select. (Actually, what such a patch would probably look like is modifying examine_simple_variable to not just punt when it finds the Var came from an aggregating subquery.) regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs