Tom Lane wrote:
Maciej Babinski <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
I see no bug here.  AFAICT your "much faster" query gets that way by
having eliminated all the candidate join rows on the B side.

The additional clause eliminates no rows beyond what the existing
clause would.  Any row eliminated by "b.join_id IS NOT NULL" could not
possibly have satisfied "a.join_id = b.join_id".

Hmm.  You assume that the = operator is strict, which is probably true,
but the hash join code isn't assuming that.


Yes, both queries return identical results.

It might be worth checking for the case, though.  What's happening,
since we go ahead and put the null rows into the hash table, is that
they all end up in the same hash chain because they all get hash code 0.
And then that very long chain gets searched for each null outer row.
If we knew the join operator is strict we could discard null rows
immediately on both sides.

Please note that if the join columns are not null, but still produce
no matches for the join, the results are fast without the need for an
extra clause in the join:

Yeah, because the rows get reasonably well distributed into different
hash buckets.  The optimizer will avoid a hash if it sees the data is
not well-distributed, but IIRC it's not considering nulls when it
makes that decision.


You're right. Filling one join column with '1', and the other with '2'
yields a plan without a hash join. Something that may be worth noting is
that while my initial problem case (200,000 rows in more complicated tables), as well as the sample test case I put together (10,000 rows in simple tables)
both yield this result, changing my test case to use 20,000 rows produces
a plan that avoids hash joins as well:

maciej_test=# EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Merge Join (cost=3435.54..3635.55 rows=1 width=16) (actual time=100.822..100.822 rows=0 loops=1)
   Merge Cond: ("outer".join_id = "inner".join_id)
-> Sort (cost=1717.77..1767.77 rows=20000 width=8) (actual time=65.401..86.333 rows=20000 loops=1)
         Sort Key: a.join_id
-> Seq Scan on a (cost=0.00..289.00 rows=20000 width=8) (actual time=0.005..21.119 rows=20000 loops=1)
   ->  Sort  (cost=1717.77..1767.77 rows=20000 width=8) (never executed)
         Sort Key: b.join_id
-> Seq Scan on b (cost=0.00..289.00 rows=20000 width=8) (never executed)
 Total runtime: 101.891 ms
(9 rows)

Time: 103.051 ms

Thanks for you time,
Maciej Babinski

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to