Re: Hash Join over Nested Loop

Pavel Stehule Fri, 22 Nov 2019 10:56:30 -0800

pá 22. 11. 2019 v 19:42 odesílatel Luís Roberto Weck <
luisrobe...@siscobra.com.br> napsal:


> Em 22/11/2019 14:55, Pavel Stehule escreveu:
>
>
>
> pá 22. 11. 2019 v 18:37 odesílatel Luís Roberto Weck <
> luisrobe...@siscobra.com.br> napsal:
>
>> Hey,
>>
>> I'm trying to figure out why Postgres is choosing a Hash Join over a
>> Nested Loop in this query:
>>
>> SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ, T2.CarAti,
>> T1.CarCod, T1.EmpCod,
>>        T2.CarFan, T1.PesDatAge, T1.PesCod, COALESCE( T3.PesDatAnt, DATE
>> '00010101') AS PesDatAnt
>>   FROM ((public.Pessoa T1
>>             INNER JOIN public.Carteira T2 ON T2.EmpCod = T1.EmpCod AND
>> T2.CarCod = T1.CarCod)
>>              LEFT JOIN  (SELECT MIN(COALESCE( T5.ConVenAnt, DATE
>> '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS ConPesCod
>>                            FROM (public.Contrato T4
>>                                  LEFT JOIN  (SELECT MIN(ConParDatVen) AS
>> ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
>>                                                FROM
>> public.ContratoParcela T5
>>                                               WHERE ConParAti = true
>>                                                 AND ConParValSal > 0
>>                                               GROUP BY EmpCod, CarCod,
>> ConPesCod, ConSeq ) T5 ON T5.EmpCod    = T4.EmpCod    AND
>>
>>                          T5.CarCod    = T4.CarCod    AND
>>
>>                          T5.ConPesCod = T4.ConPesCod AND
>>
>>                          T5.ConSeq    = T4.ConSeq)
>>                           WHERE T4.ConAti = TRUE
>>                           GROUP BY T4.EmpCod, T4.CarCod, T4.ConPesCod )
>> T3 ON t3.EmpCod    = T1.EmpCod AND
>>
>>       t3.CarCod    = T1.CarCod AND
>>
>>       t3.ConPesCod = T1.PesCod)
>>  WHERE (T2.CarAti = true)
>>    AND (T1.EmpCod = 112)
>>    and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
>>  ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod
>>
>> Here the Hash Join[1] plan takes ~700ms, and if I change the first LEFT
>> JOIN to a LEFT JOIN LATERAL, forcing a nested loop, the query[2] runs in
>> 3ms.
>>
>> [1] https://explain.depesz.com/s/8IL3
>> [2] https://explain.depesz.com/s/f8Q9
>>
>>
> Maybe I am wrong, but probably you have to do more than just change LEFT
> JOIN to LATERAL JOIN. Lateral join is based on correlated subquery - so you
> had to push some predicates to subquery - and then the query can be much
> more effective.
>
> Regards
>
> Pavel
>
>
>
>
>
>> PostgreSQL version is 11.5, I have run analyze on all the tables.
>>
>> PG settings:
>>
>> name                           |setting  |unit|
>> -------------------------------|---------|----|
>> autovacuum                     |on       |    |
>> default_statistics_target      |250      |    |
>> effective_cache_size           |983040   |8kB |
>> effective_io_concurrency       |200      |    |
>> max_parallel_workers           |6        |    |
>> max_parallel_workers_per_gather|3        |    |
>> random_page_cost               |1.1      |    |
>> work_mem                       |51200    |kB  |
>>
>
> I'm sorry, I am not sure I understood.
>
> This is the altered query:
>
> SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ, T2.CarAti,
> T1.CarCod, T1.EmpCod, T2.CarFan, T1.PesDatAge, T1.PesCod,
>        COALESCE( T3.PesDatAnt, DATE '00010101') AS PesDatAnt
>   FROM ((public.Pessoa T1
>             INNER JOIN public.Carteira T2 ON T2.EmpCod = T1.EmpCod AND
> T2.CarCod = T1.CarCod)
>              LEFT JOIN *LATERAL *(SELECT MIN(COALESCE( T5.ConVenAnt, DATE
> '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS ConPesCod
>                                   FROM (public.Contrato T4
>                                         LEFT JOIN  (SELECT
> MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
>                                                      FROM
> public.ContratoParcela T5
>                                                     WHERE ConParAti = true
>                                                       and ConParValSal > 0
>
>                                                     GROUP BY EmpCod,
> CarCod, ConPesCod, ConSeq ) T5 ON  T5.EmpCod = T4.EmpCod AND T5.CarCod =
> T4.CarCod AND T5.ConPesCod = T4.ConPesCod AND T5.ConSeq = T4.ConSeq)
>                                  WHERE T4.ConAti = TRUE
>                                    *AND t4.EmpCod = T1.EmpCod AND
> t4.CarCod = T1.CarCod AND t4.ConPesCod = T1.PesCod*
>                                  GROUP BY T4.EmpCod, T4.CarCod,
> T4.ConPesCod ) T3 ON *TRUE ) --ON t3.EmpCod = T1.EmpCod AND t3.CarCod =
> T1.CarCod AND t3.ConPesCod = T1.PesCod) *
>  WHERE (T2.CarAti = true)
>    AND (T1.EmpCod = 112)
>    and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
>  ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod
>
> In bold are the changes I've made to the query. I am sure PostgreSQL is
> able to push it down, since it is much faster now. The problem I have is
> that this is a query generated by an ORM, So I can't change it.
>
> I would like to understand why wasn't Postgres able to optimize it to a
> nested loop. Is there something I can do with the statistics?
>

I don't think. Postgres optimizer just doesn't support this optimization.
It has sense only when you know so number of loops is very small - else
nested loop should be much slower.

Regards

Pavel



> Thanks!!
>
>

Re: Hash Join over Nested Loop

Reply via email to