On 24.12.2024 13:25, Ranier Vilela wrote:
Hi Alena.
Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina
<a.rybak...@postgrespro.ru> escreveu:
Hi, hackers!
I found one pull-up that works if the inner join condition is
written through the where condition,
|create temp table ta (id int primary key, val int); insert into
ta values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);|
|create temp table tb (id int primary key, aval int); insert into
tb values(4,1); insert into tb values(5,1); insert into tb
values(1,2); create temp table tc (id int primary key, aid int);
insert into tc values(6,1); insert into tc values(7,2);|
|EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT *
FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id
<http://ta.id> = tb.id <http://tb.id>);|
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id =ta.id <http://ta.id>)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
Planning:
Buffers: shared hit=67 read=12
(14 rows)
but it doesn't work if it is written through the outside condition.
|alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON
ta.id <http://ta.id> = tb.id <http://tb.id>); QUERY PLAN
------------------------------------------------------ Seq Scan on
ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed
by Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual
rows=0 loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual
rows=0 loops=3) Filter: (ta.id <http://ta.id> = id) Rows Removed
by Filter: 3 Buffers: local hit=3 -> Seq Scan on tc (actual rows=1
loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=16
read=9 (15 rows) |
|I have written a patch to add this functionality and now it gives
an query plan: |
|alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
OFF)
SELECT *
FROM ta
WHERE EXISTS (SELECT *
FROM tb JOIN tc
ON ta.id <http://ta.id> = tb.id <http://tb.id>);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0
loops=3)
Index Cond: (id = ta.id <http://ta.id>)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
(12 rows)|
tb and tc form a Cartesian product, but in the case of the
intersection condition with tuples from the table ta (ta.id
<http://ta.id> = tb.id <http://tb.id>). So, according to the join
condition, tb intersects only with 1, and only it gets into the
result, but at the same time they appear twice - this is because
of the Cartesian product of tb with tc
|*How it works:*
|
I rewrote the code a bit so that it considers not only the quals
in jointree->quals, but also those in join expression
(subselect->jointree->fromlist). If they satisfy the conditions
for using pull up, I add them to the list of clauses and form a
"Bool" expression from them, joined by an "AND" operation.
I took a look at this patch and I did a little polishing on it.
And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.
The tests are failing, like this:
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
-> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
-> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
-> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
Index Cond: (id = ta.id <http://ta.id>)
Heap Fetches: 2
+ Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)
Yes, you are right) Thank you for your interest to this thread)
--
Regards,
Alena Rybakina
Postgres Professional