On 24.12.2024 13:25, Ranier Vilela wrote:
Hi Alena.

Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina <a.rybak...@postgrespro.ru> escreveu:

    Hi, hackers!

    I found one pull-up that works if the inner join condition is
    written through the where condition,

    |create temp table ta (id int primary key, val int); insert into
    ta values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);|

    |create temp table tb (id int primary key, aval int); insert into
    tb values(4,1); insert into tb values(5,1); insert into tb
    values(1,2); create temp table tc (id int primary key, aid int);
    insert into tc values(6,1); insert into tc values(7,2);|

    |EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT *
    FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id
    <http://ta.id> = tb.id <http://tb.id>);|
                                    QUERY PLAN
    -------------------------------------------------------------------------
      Nested Loop Semi Join (actual rows=1 loops=1)
        Buffers: local hit=6
        ->  Seq Scan on ta (actual rows=3 loops=1)
              Buffers: local hit=1
        ->  Nested Loop (actual rows=0 loops=3)
              Buffers: local hit=5
              ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
                    Index Cond: (id =ta.id  <http://ta.id>)
                    Heap Fetches: 1
                    Buffers: local hit=4
              ->  Seq Scan on tc (actual rows=1 loops=1)
                    Buffers: local hit=1
      Planning:
        Buffers: shared hit=67 read=12
    (14 rows)

    but it doesn't work if it is written through the outside condition.

    |alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
    OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON
    ta.id <http://ta.id> = tb.id <http://tb.id>); QUERY PLAN
    ------------------------------------------------------ Seq Scan on
    ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed
    by Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual
    rows=0 loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual
    rows=0 loops=3) Filter: (ta.id <http://ta.id> = id) Rows Removed
    by Filter: 3 Buffers: local hit=3 -> Seq Scan on tc (actual rows=1
    loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=16
    read=9 (15 rows) |

    |I have written a patch to add this functionality and now it gives
    an query plan: |

    |alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
    OFF)
     SELECT *
       FROM ta
      WHERE EXISTS (SELECT *
                      FROM tb JOIN tc
                      ON ta.id <http://ta.id> = tb.id <http://tb.id>);
                         QUERY PLAN
    -------------------------------------------------------------------------
     Nested Loop Semi Join (actual rows=1 loops=1)
       Buffers: local hit=6
       ->  Seq Scan on ta (actual rows=3 loops=1)
             Buffers: local hit=1
       ->  Nested Loop (actual rows=0 loops=3)
             Buffers: local hit=5
             ->  Index Only Scan using tb_pkey on tb (actual rows=0
    loops=3)
                   Index Cond: (id = ta.id <http://ta.id>)
                   Heap Fetches: 1
                   Buffers: local hit=4
             ->  Seq Scan on tc (actual rows=1 loops=1)
                   Buffers: local hit=1
    (12 rows)|

    tb and tc form a Cartesian product, but in the case of the
    intersection condition with tuples from the table ta (ta.id
    <http://ta.id> = tb.id <http://tb.id>). So, according to the join
    condition, tb intersects only with 1, and only it gets into the
    result, but at the same time they appear twice - this is because
    of the Cartesian product of tb with tc

    |*How it works:*
    |

    I rewrote the code a bit so that it considers not only the quals
    in jointree->quals, but also those in join expression
    (subselect->jointree->fromlist). If they satisfy the conditions
    for using pull up, I add them to the list of clauses and form a
    "Bool" expression from them, joined by an "AND" operation.

I took a look at this patch and I did a little polishing on it.

And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.

The tests are failing, like this:
QUERY PLAN
 -------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
 -> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
 -> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
 -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
 Index Cond: (id = ta.id <http://ta.id>)
 Heap Fetches: 2
+ Buffers: local hit=4
 -> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)

Yes, you are right) Thank you for your interest to this thread)

--
Regards,
Alena Rybakina
Postgres Professional

Reply via email to