****
Em 22/11/2019 14:55, Pavel Stehule escreveu:


pá 22. 11. 2019 v 18:37 odesílatel Luís Roberto Weck <luisrobe...@siscobra.com.br <mailto:luisrobe...@siscobra.com.br>> napsal:

    Hey,

    I'm trying to figure out why Postgres is choosing a Hash Join
    over a Nested Loop in this query:

    SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ,
    T2.CarAti, T1.CarCod, T1.EmpCod,
           T2.CarFan, T1.PesDatAge, T1.PesCod, COALESCE(
    T3.PesDatAnt, DATE '00010101') AS PesDatAnt
      FROM ((public.Pessoa T1
                INNER JOIN public.Carteira T2 ON T2.EmpCod =
    T1.EmpCod AND T2.CarCod = T1.CarCod)
                 LEFT JOIN  (SELECT MIN(COALESCE( T5.ConVenAnt, DATE
    '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS
    ConPesCod
                               FROM (public.Contrato T4
                                     LEFT JOIN (SELECT
    MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
    FROM public.ContratoParcela T5
    WHERE ConParAti = true
    AND ConParValSal > 0
    GROUP BY EmpCod, CarCod, ConPesCod, ConSeq ) T5 ON T5.EmpCod    =
    T4.EmpCod    AND
    T5.CarCod    = T4.CarCod    AND
    T5.ConPesCod = T4.ConPesCod AND
    T5.ConSeq    = T4.ConSeq)
                              WHERE T4.ConAti = TRUE
    GROUP BY T4.EmpCod, T4.CarCod, T4.ConPesCod ) T3 ON t3.EmpCod   
    = T1.EmpCod AND
                              t3.CarCod    = T1.CarCod AND
                              t3.ConPesCod = T1.PesCod)
     WHERE (T2.CarAti = true)
       AND (T1.EmpCod = 112)
       and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
     ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod

    Here the Hash Join[1] plan takes ~700ms, and if I change the
    first LEFT JOIN to a LEFT JOIN LATERAL, forcing a nested loop,
    the query[2] runs in 3ms.

    [1] https://explain.depesz.com/s/8IL3
    [2] https://explain.depesz.com/s/f8Q9


Maybe I am wrong, but probably you have to do more than just change LEFT JOIN to LATERAL JOIN. Lateral join is based on correlated subquery - so you had to push some predicates to subquery - and then the query can be much more effective.

Regards

Pavel





    PostgreSQL version is 11.5, I have run analyze on all the tables.

    PG settings:

    name                           |setting  |unit|
    -------------------------------|---------|----|
    autovacuum                     |on       |    |
    default_statistics_target      |250      |    |
    effective_cache_size           |983040   |8kB |
    effective_io_concurrency       |200      |    |
    max_parallel_workers           |6        |    |
    max_parallel_workers_per_gather|3        |    |
    random_page_cost               |1.1      |    |
    work_mem                       |51200    |kB  |


I'm sorry, I am not sure I understood.

This is the altered query:

SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ, T2.CarAti, T1.CarCod, T1.EmpCod, T2.CarFan, T1.PesDatAge, T1.PesCod,
       COALESCE( T3.PesDatAnt, DATE '00010101') AS PesDatAnt
  FROM ((public.Pessoa T1
            INNER JOIN public.Carteira T2 ON T2.EmpCod = T1.EmpCod AND T2.CarCod = T1.CarCod)              LEFT JOIN *LATERAL *(SELECT MIN(COALESCE( T5.ConVenAnt, DATE '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS ConPesCod
                                  FROM (public.Contrato T4
                                        LEFT JOIN  (SELECT MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq                                                      FROM public.ContratoParcela T5
                                                    WHERE ConParAti = true
                                                      and ConParValSal > 0

                                                    GROUP BY EmpCod, CarCod, ConPesCod, ConSeq ) T5 ON  T5.EmpCod = T4.EmpCod AND T5.CarCod = T4.CarCod AND T5.ConPesCod = T4.ConPesCod AND T5.ConSeq = T4.ConSeq)
                                 WHERE T4.ConAti = TRUE
*AND t4.EmpCod = T1.EmpCod AND t4.CarCod = T1.CarCod AND t4.ConPesCod = T1.PesCod*                                  GROUP BY T4.EmpCod, T4.CarCod, T4.ConPesCod ) T3 ON *TRUE ) --ON t3.EmpCod = T1.EmpCod AND t3.CarCod = T1.CarCod AND t3.ConPesCod = T1.PesCod) *
 WHERE (T2.CarAti = true)
   AND (T1.EmpCod = 112)
   and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
 ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod

In bold are the changes I've made to the query. I am sure PostgreSQL is able to push it down, since it is much faster now.The problem I have is that this is a query generated by an ORM, So I can't change it.

I would like to understand why wasn't Postgres able to optimize it to a nested loop. Is there something I can do with the statistics?

Thanks!!

Reply via email to