On Thu, Apr 16, 2009 at 7:26 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> Upon further review, it appears that a big part of this problem is
>> that cost_hashjoin() doesn't understand that it needs cost semi-joins
>> differently from inner or left joins.
>
> Yeah, I have a note to look into that before 8.4 final.  The same is
> true for nestloops: stopping after hitting one match to the current
> outer can make a big difference, and that's not reflected in costsize.c
> yet.  I'm not sure whether cost_mergejoin needs to care.

Cool.  It's worth noting that this is also a problem for anti-joins,
which can also stop after hitting one match.

For merge join, it looks like you might save something if there are
duplicates in both the inner and outer lists.  If the outer side
contains 1,1,1,1,1,2 and the inner side contains 1,1,1,2,2, then an
INNER JOIN or LEFT JOIN will rescan the first three tuples of the
inner side three times, whereas a SEMI or ANTI JOIN will scan the
first tuple three times and then the next two only once.  But I'm not
sure if we can estimate this accurately enough to matter.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to