Re: [BUGS] New hashed IN code ignores distinctiveness of subquery

Bradley Baetz Sun, 26 Jan 2003 19:23:21 -0800

On Sun, Jan 26, 2003 at 09:43:18PM -0500, Tom Lane wrote:

> We're already checking that as a separate plan alternative.  The
> implementation could be improved a little, though --- we could combine
> the uniq-ification into the Hash node.


Right, or skip it entirely when selecting stuff with unique constraints.

> AFAICS there are two or three different concepts of selectivity being
> tossed about here.

<snip>

Ah, OK. Yeah, I think I was confusing myself there.

> 
> It might be that we have to bite the bullet and set up a different
> selectivity estimation procedure for IN.  I'd prefer to avoid that
> but haven't really figured out if it's necessary or not.

I don't think it is. The number of rows is correct if you do product_id
IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on. The
only difference for a subselect is that the number of values generated
is only known approximately, rather than exactly, but I don't think that
thats relevent.

> In the meantime, I think you are right that it's bogus that
> set_joinrel_size_estimates uses different equations for JOIN_IN and
> JOIN_UNIQUE_INNER.  Whatever the decision is about how to do the
> estimation, these should be done the same way.

Hmm. So, I made that change, and now get an estimate of 50218 rows for
the join result, which is about right (although capping the result to
have a maximum of the number of input rows is valid for an inner join),
but the cost for the hashjoin is still 4281586.50. cost_hashjoin
probably needs to be taught about the short circuiting done for _IN, but
I'm not sure where to do that from a quick glance through the code.

What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do
that JOIN_IN doesn't? executor/* doesn't appear to use it.

>                       regards, tom lane

Bradley

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [BUGS] New hashed IN code ignores distinctiveness of subquery

Reply via email to