Hi Alena. Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina < a.rybak...@postgrespro.ru> escreveu:
> Hi, hackers! > > I found one pull-up that works if the inner join condition is written > through the where condition, > > create temp table ta (id int primary key, val int); > insert into ta values(1,1); > insert into ta values(2,2);insert into ta values(3,3); > create temp table tb (id int primary key, aval int); > insert into tb values(4,1); > insert into tb values(5,1); > insert into tb values(1,2); > > create temp table tc (id int primary key, aid int); > insert into tc values(6,1); > insert into tc values(7,2); > > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > SELECT * > FROM ta > WHERE EXISTS (SELECT * > FROM tb, tc > WHERE ta.id = tb.id); > QUERY PLAN > ------------------------------------------------------------------------- > Nested Loop Semi Join (actual rows=1 loops=1) > Buffers: local hit=6 > -> Seq Scan on ta (actual rows=3 loops=1) > Buffers: local hit=1 > -> Nested Loop (actual rows=0 loops=3) > Buffers: local hit=5 > -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3) > Index Cond: (id = ta.id) > Heap Fetches: 1 > Buffers: local hit=4 > -> Seq Scan on tc (actual rows=1 loops=1) > Buffers: local hit=1 > Planning: > Buffers: shared hit=67 read=12 > (14 rows) > > but it doesn't work if it is written through the outside condition. > > alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > SELECT * > FROM ta > WHERE EXISTS (SELECT * > FROM tb JOIN tc > ON ta.id = tb.id); > QUERY PLAN > ------------------------------------------------------ > Seq Scan on ta (actual rows=1 loops=1) > Filter: EXISTS(SubPlan 1) > Rows Removed by Filter: 2 > Buffers: local hit=5 > SubPlan 1 > -> Nested Loop (actual rows=0 loops=3) > Buffers: local hit=4 > -> Seq Scan on tb (actual rows=0 loops=3) > Filter: (ta.id = id) > Rows Removed by Filter: 3 > Buffers: local hit=3 > -> Seq Scan on tc (actual rows=1 loops=1) > Buffers: local hit=1 > Planning: > Buffers: shared hit=16 read=9 > (15 rows) > > > I have written a patch to add this functionality and now it gives an query > plan: > > alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > SELECT * > FROM ta > WHERE EXISTS (SELECT * > FROM tb JOIN tc > ON ta.id = tb.id); > QUERY PLAN > ------------------------------------------------------------------------- > Nested Loop Semi Join (actual rows=1 loops=1) > Buffers: local hit=6 > -> Seq Scan on ta (actual rows=3 loops=1) > Buffers: local hit=1 > -> Nested Loop (actual rows=0 loops=3) > Buffers: local hit=5 > -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3) > Index Cond: (id = ta.id) > Heap Fetches: 1 > Buffers: local hit=4 > -> Seq Scan on tc (actual rows=1 loops=1) > Buffers: local hit=1 > (12 rows) > > tb and tc form a Cartesian product, but in the case of the intersection > condition with tuples from the table ta (ta.id = tb.id). So, according to > the join condition, tb intersects only with 1, and only it gets into the > result, but at the same time they appear twice - this is because of the > Cartesian product of tb with tc > *How it works:* > > I rewrote the code a bit so that it considers not only the quals in > jointree->quals, but also those in join expression > (subselect->jointree->fromlist). If they satisfy the conditions for using > pull up, I add them to the list of clauses and form a "Bool" expression > from them, joined by an "AND" operation. > I took a look at this patch and I did a little polishing on it. And I believe that in testing, you need to set it to BUFFERS OFF, because of the recent change made to ANALYZE. The tests are failing, like this: QUERY PLAN ------------------------------------------------------------------------- Nested Loop Semi Join (actual rows=2 loops=1) + Buffers: local hit=7 -> Seq Scan on ta (actual rows=2 loops=1) + Buffers: local hit=1 -> Nested Loop (actual rows=1 loops=2) + Buffers: local hit=6 -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2) Index Cond: (id = ta.id) Heap Fetches: 2 + Buffers: local hit=4 -> Seq Scan on tc (actual rows=1 loops=2) -(7 rows) + Buffers: local hit=2 +(12 rows) best regards, Ranier Vilela
v1-0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patch
Description: Binary data