from:"Narendra Pradeep U U"

Converting Nested loop to hashjoin for not is distinct from case

2019-08-28 Thread Narendra Pradeep U U

Hi, 

  

  I have a  join query with is not distinct from criteria (select * 
from table1 join table2 on a is not  distinct from b). on examining,  it was 
the  nested loop join that makes the query slow.  For join with ' is not 
distinct from ' qual can't the planner choose  hash-join over nested loop  or 
will there be any problem by doing so. Kindly enlighten me about the same.


Thanks
Pradeep

Ambigous Plan - Larger Table on Hash Side

2018-03-12 Thread Narendra Pradeep U U

Hi ,



  Recently I came across a case where the planner choose larger table on 
hash side. I am not sure whether it is an intended  behavior or we are missing 
something. 

  

 I have two tables (a and b) each with  single column in it. One table 'a' 
is large with around 30 million distinct rows and other table 'b' has merely 
70,000 rows with one-seventh (10,000) distinct rows. I have analyzed both the 
table.  But while joining both the table I get the larger table on hash side. 

tpch=# explain select b from b left join a on a = b;

   QUERY PLAN   
 

-

 Hash Left Join  (cost=824863.75..950104.42 rows=78264 width=4)

   Hash Cond: (b.b = a.a)o

   ->  Foreign Scan on b  (cost=0.00..821.64 rows=78264 width=4)

 CStore File: /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879

 CStore File Size: 314587

   ->  Hash  (cost=321721.22..321721.22 rows=30667722 width=4)

 ->  Foreign Scan on a  (cost=0.00..321721.22 rows=30667722 width=4)

   CStore File: 
/home/likewise-open/pg96/data/cstore_fdw/1818708/1849876

   CStore File Size: 123236206

(9 rows)






I would like to know the reason for choosing this plan and Is there a easy fix 
to prevent such plans (especially like this one where it choose a larger hash 
table) ?  



Thanks,

Pradeep

Re: Ambigous Plan - Larger Table on Hash Side

2018-03-13 Thread Narendra Pradeep U U

Hi, 

  Thanks everyone for your suggestions. I would like to add  explain 
analyze of both the plans so that we can have broader picture.



I have a work_mem of 1000 MB.



The Plan which we get regularly with table being analyzed .

tpch=# explain analyze  select b from tab2 left join tab1 on a = b; 

   QUERY PLAN   




 Hash Left Join  (cost=945515.68..1071064.34 rows=78264 width=4) (actual 
time=9439.410..20445.620 rows=78264 loops=1)

   Hash Cond: (tab2.b = tab1.a)

   ->  Seq Scan on tab2  (cost=0.00..1129.64 rows=78264 width=4) (actual 
time=0.006..5.116 rows=78264 loops=1)

   ->  Hash  (cost=442374.30..442374.30 rows=30667630 width=4) (actual 
time=9133.593..9133.593 rows=30667722 loops=1)

 Buckets: 33554432  Batches: 2  Memory Usage: 801126kB

 ->  Seq Scan on tab1  (cost=0.00..442374.30 rows=30667630 width=4) 
(actual time=0.030..3584.652 rows=30667722 loops=1)

 Planning time: 0.055 ms

 Execution time: 20472.603 ms

(8 rows)








I reproduced the  other plan by not analyzing the smaller table.

tpch=# explain analyze  select b from tab2 left join tab1 on a = b; 

QUERY PLAN  
  

--

 Hash Right Join  (cost=2102.88..905274.97 rows=78039 width=4) (actual 
time=15.331..7590.406 rows=78264 loops=1)

   Hash Cond: (tab1.a = tab2.b)

   ->  Seq Scan on tab1  (cost=0.00..442375.48 rows=30667748 width=4) 
(actual time=0.046..2697.480 rows=30667722 loops=1)

   ->  Hash  (cost=1127.39..1127.39 rows=78039 width=4) (actual 
time=15.133..15.133 rows=78264 loops=1)

 Buckets: 131072  Batches: 1  Memory Usage: 3776kB

 ->  Seq Scan on tab2  (cost=0.00..1127.39 rows=78039 width=4) 
(actual time=0.009..5.516 rows=78264 loops=1)

 Planning time: 0.053 ms

 Execution time: 7592.688 ms

(8 rows)



 
The actual plan seems to be Slower. The smaller table (tab2) has exactly each 
row duplicated 8 times  and all the rows in larger table (tab2) are distinct. 
what may be the exact reason  and  can we fix this ?
 
P.s I have also attached a sql file to reproduce this 



 On Tue, 13 Mar 2018 12:42:12 +0530 Ashutosh Bapat 
<ashutosh.ba...@enterprisedb.com> wrote 




On Mon, Mar 12, 2018 at 10:02 PM, Narendra Pradeep U U 

<narendra.prad...@zohocorp.com> wrote: 

> Hi , 

> 

> Recently I came across a case where the planner choose larger table on 

> hash side. I am not sure whether it is an intended behavior or we are 

> missing something. 

> 

> I have two tables (a and b) each with single column in it. One table 

> 'a' is large with around 30 million distinct rows and other table 'b' has 

> merely 70,000 rows with one-seventh (10,000) distinct rows. I have 
analyzed 

> both the table. But while joining both the table I get the larger table on 

> hash side. 

> 

> tpch=# explain select b from b left join a on a = b; 

> QUERY PLAN 

> 
-
 

> Hash Left Join (cost=824863.75..950104.42 rows=78264 width=4) 

> Hash Cond: (b.b = a.a)o 

> -> Foreign Scan on b (cost=0.00..821.64 rows=78264 width=4) 

> CStore File: 

> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879 

> CStore File Size: 314587 

> -> Hash (cost=321721.22..321721.22 rows=30667722 width=4) 

> -> Foreign Scan on a (cost=0.00..321721.22 rows=30667722 width=4) 

> CStore File: 

> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849876 

> CStore File Size: 123236206 

> (9 rows) 

> 

> 

> 

> I would like to know the reason for choosing this plan and Is there a easy 

> fix to prevent such plans (especially like this one where it choose a 
larger 

> hash table) ? 

 

A plan with larger table being hashed doesn't necessarily bad 

performing one. During partition-wise join analysis I have seen plans 

with larger table being hashed perform better than the plans with 

smaller table being hashed. But I have seen the other way around as 

well. Although, I don't know an easy way to force which side of join 

gets hashed. I tried that under the debugger. In your case, if you run 

EXPLAIN ANALYZE on this query, produce outputs of two plans: one with 

larger table being hashed and second with the smaller one being 

hashed, you will see which of them performs better. 

 

-- 

Best Wishes, 

Ashutosh Bapat 

EnterpriseDB Corporation 

The Postgres Database Company 

 








join_plan.sql
Description: Binary data

Re: Ambigous Plan - Larger Table on Hash Side

2018-03-15 Thread Narendra Pradeep U U

Hi Jeff,

  

   I repeated the same query with a  work_mem of 2000MB. It is faster than 
the one with two batches but still slower than hashing the smaller table. So in 
this case It makes more sense to hash the smaller table (less execution time 
and reduce hash table size). 



Explain analyze with higher work_mem (2 GB)

tpch=# explain analyze  select b from tab2 left join tab1 on a = b; 

   QUERY PLAN   




 Hash Left Join  (cost=825722.33..830863.00 rows=78264 width=4) (actual 
time=7876.746..7905.323 rows=78264 loops=1)

   Hash Cond: (tab2.b = tab1.a)

   ->  Seq Scan on tab2  (cost=0.00..1129.64 rows=78264 width=4) (actual 
time=0.007..4.268 rows=78264 loops=1)

   ->  Hash  (cost=442375.48..442375.48 rows=30667748 width=4) (actual 
time=7834.214..7834.214 rows=30667722 loops=1)

 Buckets: 33554432  Batches: 1  Memory Usage: 1340307kB

 ->  Seq Scan on tab1  (cost=0.00..442375.48 rows=30667748 width=4) 
(actual time=0.047..2267.995 rows=30667722 loops=1)

 Planning time: 0.052 ms

 Execution time: 7953.913 ms

(8 rows)

Yeah, explain analyze  distorted  the time, while the actual query is twice 
faster.




 Still, 2GB work_mem is not a feasible idea. Is there a way to avoid these plan 
(especially avoiding  larger table on hash side)  or any work around available ?



Thanks,

Pradeep





 On Thu, 15 Mar 2018 00:40:27 +0530 Jeff Janes <jeff.ja...@gmail.com> 
wrote 









On Tue, Mar 13, 2018 at 4:02 AM, Narendra Pradeep U U 
<narendra.prad...@zohocorp.com> wrote:



Hi, 

  Thanks everyone for your suggestions. I would like to add  explain 
analyze of both the plans so that we can have broader picture.



I have a work_mem of 1000 MB.





Is it possible to repeat with 2000MB or 3000MB?  It would be interesting to see 
what the estimated cost and what the actual time would be if there were only 1 
batch rather than 2.



Also, can you repeat all of these with EXPLAIN (ANALYZE, TIMING OFF) ?  
Sometimes the act of measuring the times can distort the times by quite a bit.  
(It will still give an overall execution time, it just won't try to attribute 
that time to the individual steps) 

 

Cheers,



Jeff

Converting Nested loop to hashjoin for not is distinct from case

Ambigous Plan - Larger Table on Hash Side

Re: Ambigous Plan - Larger Table on Hash Side

Re: Ambigous Plan - Larger Table on Hash Side

4 matches

Site Navigation

Mail list logo

Footer information