Kevin Grittner <kgri...@ymail.com> wrote: > I applied it to master and ran the regression tests, and one of > the subselect tests failed. > > This query: > > SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second > Field" > FROM SUBSELECT_TBL upper > WHERE f1 IN > (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);
> [ ... ] during the `make check` or `make install-check` [ ... ] > is missing the last two rows. Oddly, if I go into the database > later and try it, the rows show up. It's not immediately > apparent to me what's wrong. Using the v2 patch, with the default statistics from table creation, the query modified with an alias of "lower" for the second reference, just for clarity, yields a plan which generates incorrect results: Hash Join (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 rows=3 loops=1) Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = lower.f2)) -> Seq Scan on subselect_tbl upper (cost=0.00..27.70 rows=1770 width=16) (actual time=0.006..0.007 rows=8 loops=1) -> Hash (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 rows=5 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> HashAggregate (cost=32.12..34.12 rows=200 width=12) (actual time=0.014..0.018 rows=6 loops=1) -> Seq Scan on subselect_tbl lower (cost=0.00..27.70 rows=1770 width=12) (actual time=0.002..0.004 rows=8 loops=1) Total runtime: 0.111 ms As soon as there is a VACUUM and/or ANALYZE it generates a plan more like what the OP was hoping for: Hash Semi Join (cost=1.20..2.43 rows=6 width=12) (actual time=0.031..0.036 rows=5 loops=1) Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = lower.f2)) -> Seq Scan on subselect_tbl upper (cost=0.00..1.08 rows=8 width=16) (actual time=0.004..0.007 rows=8 loops=1) -> Hash (cost=1.08..1.08 rows=8 width=12) (actual time=0.012..0.012 rows=7 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on subselect_tbl lower (cost=0.00..1.08 rows=8 width=12) (actual time=0.003..0.005 rows=8 loops=1) Total runtime: 0.074 ms By comparison, without the patch this is the plan: Seq Scan on subselect_tbl upper (cost=0.00..5.59 rows=4 width=12) (actual time=0.022..0.037 rows=5 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 3 SubPlan 1 -> Seq Scan on subselect_tbl lower (cost=0.00..1.12 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=8) Filter: ((upper.f2)::double precision = f3) Rows Removed by Filter: 4 Total runtime: 0.066 ms When I run the query with fresh statistics and without EXPLAIN both ways, the unpatched is consistently about 10% faster. So pulling up the subquery can yield an incorrect plan, and even when it yields the "desired" plan with the semi-join it is marginally slower than using the subplan, at least for this small data set. I think it's an interesting idea, but it still needs work. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers