Bradley Baetz <[EMAIL PROTECTED]> writes: > I've been trying out the new hased subselect code from CVS. It appears > that the planner isn't taking the distinctiveness of the values from the > subselect into account:
This isn't really anything to do with the new IN code, but is a long-standing problem: cost_mergejoin doesn't apply any penalty factor for the case where there are lots of duplicates in both inner and outer relation (causing rescans of big chunks of the inner relation). You can see the rescanning happening in the EXPLAIN ANALYZE output: > -> Merge Join (cost=0.00..3485661.38 rows=5570 width=8) (actual > time=0.15..1429696.69 rows=50000 loops=1) > Merge Cond: ("outer".product_id = "inner".product_id) > -> Index Scan using bugs_product_id_idx on bugs > (cost=0.00..2313.33 rows=50000 width=4) (actual time=0.12..358.43 > rows=50000 loops=1) > -> Index Scan using bugs_product_id_idx on bugs > (cost=0.00..2313.33 rows=50000 width=4) (actual time=0.01..1152455.44 > rows=277884160 loops=1) ^^^^^^^^^^^^^^ 277884160 rows pulled from a 50000-row relation means a heck of a lot of rescanning went on :-( The good news is that the system *is* aware of the small number of distinct values in the table (note the dead-on estimate of the number of distinct rows in your other query; which I think is from new-for-7.4 code, though the required stats have been available since 7.2). I think it'd probably be possible to make some reasonable estimate of the amount of rescanning required, and then inflate the mergejoin cost estimate proportionally. I have not gotten around to looking at the problem though. Care to take a shot at it? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])