[HACKERS] convert EXSITS to inner join gotcha and bug

Teodor Sigaev Fri, 28 Apr 2017 02:12:07 -0700

Hi!

Seems, there two issues:

1) Sometime conditions which for a first glance could be pushed down to scan areleaved as join quals. And it could be a ~30 times performance loss.


2) Number of query result depend on enabe_seqscan variable.

The query
explain analyze
SELECT
        *
FROM
        t1
        INNER JOIN t2 ON (
                EXISTS (
                        SELECT
                                true
                        FROM
                                t3
                        WHERE
                                t3.id1 = t1.id  AND
                                t3.id2 = t2.id
                )
        )
WHERE
        t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

produces following plan:

Nested Loop (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922rows=162 loops=1)

   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 70368

-> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1 width=66)(actual time=0.100..0.103 rows=1 loops=1)

         Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
         Heap Fetches: 1

-> Hash Join (cost=6.14..1918.37 rows=163 width=66) (actualtime=0.370..120.971 rows=70530 loops=1)

(1)      Hash Cond: (t3.id2 = t2.id)

(2) -> Seq Scan on t3 (cost=0.00..1576.30 rows=70530 width=66) (actualtime=0.017..27.424 rows=70530 loops=1)-> Hash (cost=3.84..3.84 rows=184 width=33) (actualtime=0.273..0.273 rows=184 loops=1)

               Buckets: 1024  Batches: 1  Memory Usage: 20kB

-> Seq Scan on t2 (cost=0.00..3.84 rows=184 width=33) (actualtime=0.017..0.105 rows=184 loops=1)

 Planning time: 7.326 ms
 Execution time: 149.115 ms

Condition (1) is not pushed to scan (2) which seemsly could be safely moved.With seqscan = off condition is not pushed too but query returns only one rowinstead of 162. Scan on t3 returns ~70000 rows but only ~150 rows are reallyneeded. I didn't found a combination of GUCs enable_* to push down that and itseems to me there is reason for that which I don't see or support is somehow missed.

If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index onthem) the query could be directly rewrited to inner join and its plan is:Nested Loop (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232rows=162 loops=1)-> Nested Loop (cost=9.43..292.77 rows=25 width=99) (actualtime=0.316..0.645 rows=162 loops=1)-> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1width=66) (actual time=0.047..0.050 rows=1 loops=1)

               Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
               Heap Fetches: 1

-> Bitmap Heap Scan on t3 (cost=9.15..283.53 rows=94 width=66)(actual time=0.257..0.426 rows=162 loops=1)

               Recheck Cond: (id1 = t1.id)
               Heap Blocks: exact=3

-> Bitmap Index Scan on t3i1 (cost=0.00..9.12 rows=94 width=0)(actual time=0.186..0.186 rows=162 loops=1)

                     Index Cond: (id1 = t1.id)

-> Index Only Scan using t2i1 on t2 (cost=0.27..0.29 rows=1 width=33)(actual time=0.024..0.024 rows=1 loops=162)

         Index Cond: (id = t3.id2)
         Heap Fetches: 162
 Planning time: 5.532 ms
 Execution time: 5.457 ms

Second plan is ~30 times faster. But with turned off sequentual scan the firstquery is not work correctly, which points to some bug in planner, I suppose.Both 9.6 and 10devel are affected to addiction of query result on seqscan variable.



Dump to reproduce (subset of real data but obfucated), queries are in attachment
http://sigaev.ru/misc/exists_to_nested.sql.gz
--
Teodor Sigaev                                   E-mail: [email protected]
                                                   WWW: http://www.sigaev.ru/

--query returns 162 rows
explain analyze
SELECT
        *
FROM
        t1
        INNER JOIN t2 ON (
                EXISTS (
                        SELECT
                                true
                        FROM
                                t3
                        WHERE
                                t3.id1 = t1.id  AND
                                t3.id2 = t2.id
                )
        )
WHERE
        t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

set enable_seqscan=off;

--the same query returns only one row!
explain analyze
SELECT
        *
FROM
        t1
        INNER JOIN t2 ON (
                EXISTS (
                        SELECT
                                true
                        FROM
                                t3
                        WHERE
                                t3.id1 = t1.id  AND
                                t3.id2 = t2.id
                )
        )
WHERE
        t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

explain analyze
SELECT
    t1.*
FROM
        t1, t2, t3
WHERE
        t1.name = '5c5fec6a41b8809972870abc154b3ecd' AND
        t3.id1 = t1.id  AND
        t3.id2 = t2.id;

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] convert EXSITS to inner join gotcha and bug

Reply via email to