On Sun, Jan 26, 2003 at 09:43:18PM -0500, Tom Lane wrote: > We're already checking that as a separate plan alternative. The > implementation could be improved a little, though --- we could combine > the uniq-ification into the Hash node.
Right, or skip it entirely when selecting stuff with unique constraints. > AFAICS there are two or three different concepts of selectivity being > tossed about here. <snip> Ah, OK. Yeah, I think I was confusing myself there. > > It might be that we have to bite the bullet and set up a different > selectivity estimation procedure for IN. I'd prefer to avoid that > but haven't really figured out if it's necessary or not. I don't think it is. The number of rows is correct if you do product_id IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on. The only difference for a subselect is that the number of values generated is only known approximately, rather than exactly, but I don't think that thats relevent. > In the meantime, I think you are right that it's bogus that > set_joinrel_size_estimates uses different equations for JOIN_IN and > JOIN_UNIQUE_INNER. Whatever the decision is about how to do the > estimation, these should be done the same way. Hmm. So, I made that change, and now get an estimate of 50218 rows for the join result, which is about right (although capping the result to have a maximum of the number of input rows is valid for an inner join), but the cost for the hashjoin is still 4281586.50. cost_hashjoin probably needs to be taught about the short circuiting done for _IN, but I'm not sure where to do that from a quick glance through the code. What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do that JOIN_IN doesn't? executor/* doesn't appear to use it. > regards, tom lane Bradley ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]