Re: [BUGS] New hashed IN code ignores distinctiveness of subquery

Bradley Baetz Sun, 26 Jan 2003 20:50:37 -0800

On Sun, Jan 26, 2003 at 11:18:31PM -0500, Tom Lane wrote:
> Bradley Baetz <[EMAIL PROTECTED]> writes:
> > Right, or skip it entirely when selecting stuff with unique constraints.
> 
> I'm hesitant to do that until we have some scheme in place for
> invalidating cached plans.


By cached, do you mean PREPARE stuff, or something else?

> 
> > I don't think it is. The number of rows is correct if you do product_id
> > IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on.
> 
> But that's a completely different code path; it doesn't even enter the
> routines we're concerned about here.

Yes, but its the same concept. Although we seem to be agreeing about
that now :)

> > What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do
> > that JOIN_IN doesn't?
> 
> Uniqify the inner/outer path and then do a normal inner join.  See
> joinpath.c.

Ah, OK. If I comment out line 547 of joinrels.c (which adds JOIN_IN to
the set of join paths) so that the UNIQUE joins are all that are left to
try, then I get:

bbaetz=# explain analyze select count(*) FROM bugs where product_id IN
(SELECT product_id FROM bugs);
                                                                   QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3494816.98..3494816.98 rows=1 width=8) (actual
time=579.71..579.71 rows=1 loops=1)
   ->  Merge Join  (cost=5169.41..3494691.43 rows=50218 width=8) (actual
time=111.41..530.16 rows=50000 loops=1)
         Merge Cond: ("outer".product_id = "inner".product_id)
         ->  Index Scan using bugs_product_id_idx on bugs
(cost=0.00..1834.52 rows=50000 width=4) (actual time=0.13..249.57
rows=50000 loops=1)
         ->  Sort  (cost=920.14..920.17 rows=9 width=4) (actual
time=111.25..143.42 rows=44476 loops=1)
               Sort Key: public.bugs.product_id
               ->  HashAggregate  (cost=920.00..920.00 rows=9 width=4)
(actual time=111.17..111.18 rows=9 loops=1)
                     ->  Seq Scan on bugs  (cost=0.00..795.00 rows=50000
width=4) (actual time=0.00..67.41 rows=50000 loops=1)
 Total runtime: 579.84 msec
(9 rows)

(This isn't picked without my hack, because the cost is slightly higher
than the JOIN_IN version)

However, its much faster (although not as fast as sticking the DISTINCT
in there myself), but the actual rows coming from the sort is really odd
- where is that number coming from? How can sorting 9 rows take 44476
anythings? The final mergejoin cost is still way off, too.

>                       regards, tom lane

Thanks,

Bradley

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [BUGS] New hashed IN code ignores distinctiveness of subquery

Reply via email to