Tom Lane wrote:
"Maciej Babinski" <[EMAIL PROTECTED]> writes:
Hash join of columns with many null fields is very slow unless the null
fields are commented out.

I see no bug here.  AFAICT your "much faster" query gets that way by
having eliminated all the candidate join rows on the B side.

                        regards, tom lane


The additional clause eliminates no rows beyond what the existing clause would. Any row eliminated by "b.join_id IS NOT NULL" could not possibly have satisfied
"a.join_id = b.join_id".

Please note that if the join columns are not null, but still produce no matches for the join, the results are fast without the need for an extra clause in the join:

DROP TABLE a;
DROP TABLE b;

CREATE TABLE a (id integer, join_id integer);
CREATE TABLE b (id integer, join_id integer);

INSERT INTO a (id) SELECT generate_series(1,10000);
INSERT INTO b (id) SELECT generate_series(1,10000);

ANALYZE a;
ANALYZE b;

EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 14 seconds */ EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; /* 5ms */

UPDATE a SET join_id=1;
UPDATE b SET join_id=2;

EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 72ms */
EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND b.join_id != 2; /* 48ms */


It seems to me that such a wild disparity in performance due to the addition of a clause that is implied by the existing clause should be considered a bug, but if I need to submit a feature
request for the optimizer, then I'd be happy to. Thanks!

Maciej Babinski

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to