Re: [HACKERS] Reference to parent query from ANY sublink

Kevin Grittner Wed, 11 Dec 2013 12:15:14 -0800

Kevin Grittner <[email protected]> wrote:

> I applied it to master and ran the regression tests, and one of
> the subselect tests failed.
>
> This query:
>
> SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second
> Field"
>   FROM SUBSELECT_TBL upper
>   WHERE f1 IN
>     (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);


> [ ... ] during the `make check` or `make install-check` [ ... ]
> is missing the last two rows.  Oddly, if I go into the database
> later and try it, the rows show up.  It's not immediately
> apparent to me what's wrong.

Using the v2 patch, with the default statistics from table
creation, the query modified with an alias of "lower" for the
second reference, just for clarity, yields a plan which generates
incorrect results:

 Hash Join  (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 
rows=3 loops=1)
   Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = 
lower.f2))
   ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16) 
(actual time=0.006..0.007 rows=8 loops=1)
   ->  Hash  (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 
rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  HashAggregate  (cost=32.12..34.12 rows=200 width=12) (actual 
time=0.014..0.018 rows=6 loops=1)
               ->  Seq Scan on subselect_tbl lower  (cost=0.00..27.70 rows=1770 
width=12) (actual time=0.002..0.004 rows=8 loops=1)
 Total runtime: 0.111 ms

As soon as there is a VACUUM and/or ANALYZE it generates a plan
more like what the OP was hoping for:

 Hash Semi Join  (cost=1.20..2.43 rows=6 width=12) (actual time=0.031..0.036 
rows=5 loops=1)
   Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = 
lower.f2))
   ->  Seq Scan on subselect_tbl upper  (cost=0.00..1.08 rows=8 width=16) 
(actual time=0.004..0.007 rows=8 loops=1)
   ->  Hash  (cost=1.08..1.08 rows=8 width=12) (actual time=0.012..0.012 rows=7 
loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on subselect_tbl lower  (cost=0.00..1.08 rows=8 width=12) 
(actual time=0.003..0.005 rows=8 loops=1)
 Total runtime: 0.074 ms

By comparison, without the patch this is the plan:

 Seq Scan on subselect_tbl upper  (cost=0.00..5.59 rows=4 width=12) (actual 
time=0.022..0.037 rows=5 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 3
   SubPlan 1
     ->  Seq Scan on subselect_tbl lower  (cost=0.00..1.12 rows=1 width=4) 
(actual time=0.002..0.003 rows=1 loops=8)
           Filter: ((upper.f2)::double precision = f3)
           Rows Removed by Filter: 4
 Total runtime: 0.066 ms

When I run the query with fresh statistics and without EXPLAIN both
ways, the unpatched is consistently about 10% faster.

So pulling up the subquery can yield an incorrect plan, and even
when it yields the "desired" plan with the semi-join it is
marginally slower than using the subplan, at least for this small
data set.  I think it's an interesting idea, but it still needs
work.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reference to parent query from ANY sublink

Reply via email to