On Thu, Apr 16, 2009 at 7:26 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> Upon further review, it appears that a big part of this problem is >> that cost_hashjoin() doesn't understand that it needs cost semi-joins >> differently from inner or left joins. > > Yeah, I have a note to look into that before 8.4 final. The same is > true for nestloops: stopping after hitting one match to the current > outer can make a big difference, and that's not reflected in costsize.c > yet. I'm not sure whether cost_mergejoin needs to care.
Cool. It's worth noting that this is also a problem for anti-joins, which can also stop after hitting one match. For merge join, it looks like you might save something if there are duplicates in both the inner and outer lists. If the outer side contains 1,1,1,1,1,2 and the inner side contains 1,1,1,2,2, then an INNER JOIN or LEFT JOIN will rescan the first three tuples of the inner side three times, whereas a SEMI or ANTI JOIN will scan the first tuple three times and then the next two only once. But I'm not sure if we can estimate this accurately enough to matter. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers