Re: [BUGS] New hashed IN code ignores distinctiveness of subquery

Tom Lane Sun, 26 Jan 2003 11:14:03 -0800

Bradley Baetz <[EMAIL PROTECTED]> writes:
> I've been trying out the new hased subselect code from CVS. It appears
> that the planner isn't taking the distinctiveness of the values from the
> subselect into account:


This isn't really anything to do with the new IN code, but is a
long-standing problem: cost_mergejoin doesn't apply any penalty factor
for the case where there are lots of duplicates in both inner and outer
relation (causing rescans of big chunks of the inner relation).  You can
see the rescanning happening in the EXPLAIN ANALYZE output:

>    ->  Merge Join  (cost=0.00..3485661.38 rows=5570 width=8) (actual
> time=0.15..1429696.69 rows=50000 loops=1)
>          Merge Cond: ("outer".product_id = "inner".product_id)
>          ->  Index Scan using bugs_product_id_idx on bugs
> (cost=0.00..2313.33 rows=50000 width=4) (actual time=0.12..358.43
> rows=50000 loops=1)
>          ->  Index Scan using bugs_product_id_idx on bugs
> (cost=0.00..2313.33 rows=50000 width=4) (actual time=0.01..1152455.44
> rows=277884160 loops=1)
  ^^^^^^^^^^^^^^

277884160 rows pulled from a 50000-row relation means a heck of a lot of
rescanning went on :-(

The good news is that the system *is* aware of the small number of
distinct values in the table (note the dead-on estimate of the number of
distinct rows in your other query; which I think is from new-for-7.4
code, though the required stats have been available since 7.2).

I think it'd probably be possible to make some reasonable estimate of
the amount of rescanning required, and then inflate the mergejoin cost
estimate proportionally.  I have not gotten around to looking at the
problem though.  Care to take a shot at it?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [BUGS] New hashed IN code ignores distinctiveness of subquery

Reply via email to