The following bug has been logged online: Bug reference: 2930 Logged by: Maciej Babinski Email address: [EMAIL PROTECTED] PostgreSQL version: 8.2 Operating system: CentOS 4.4 (linux 2.6.9) Description: Hash join abyssmal with many null fields. Details:
Hash join of columns with many null fields is very slow unless the null fields are commented out. Steps to reproduce: CREATE TABLE a (id integer, join_id integer); CREATE TABLE b (id integer, join_id integer); INSERT INTO a (id) SELECT generate_series(1,10000); INSERT INTO b (id) SELECT generate_series(1,10000); ANALYZE a; ANALYZE b; EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id; EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; Here are the results of the two explains: test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id; EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------------- Hash Left Join (cost=170.00..1590.01 rows=10000 width=16) (actual time=23.100..9451.778 rows=10000 loops=1) Hash Cond: (a.join_id = b.join_id) -> Seq Scan on a (cost=0.00..145.00 rows=10000 width=8) (actual time=0.014..11.071 rows=10000 loops=1) -> Hash (cost=145.00..145.00 rows=10000 width=8) (actual time=21.999..21.999 rows=10000 loops=1) -> Seq Scan on b (cost=0.00..145.00 rows=10000 width=8) (actual time=0.011..10.679 rows=10000 loops=1) Total runtime: 9460.944 ms (6 rows) test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------- Hash Left Join (cost=145.00..340.01 rows=10000 width=16) (actual time=1.970..32.452 rows=10000 loops=1) Hash Cond: (a.join_id = b.join_id) -> Seq Scan on a (cost=0.00..145.00 rows=10000 width=8) (actual time=0.033..10.326 rows=10000 loops=1) -> Hash (cost=145.00..145.00 rows=1 width=8) (actual time=1.924..1.924 rows=0 loops=1) -> Seq Scan on b (cost=0.00..145.00 rows=1 width=8) (actual time=1.922..1.922 rows=0 loops=1) Filter: (join_id IS NOT NULL) Total runtime: 41.495 ms (7 rows) Note that the two ON clauses are logically identical, but the second query is much faster than the first. The first query will slow down exponentially with the number of rows: 200,000 rows yielded a runtime of over ten hours. Thanks! ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend