56mb / 26mb is very small size, do you observe data skew? More precisely, many 
records with the same chrname / name?  And can you also double check the jvm 
settings for the executor process?


From: [email protected] [mailto:[email protected]]
Sent: Tuesday, May 5, 2015 7:50 PM
To: Cheng, Hao; Wang, Daoyuan; Olivier Girardot; user
Subject: 回复:Re: sparksql running slow while joining_2_tables.


Hi guys,

          attache the pic of physical plan and logs.Thanks.

--------------------------------

Thanks&Best regards!
罗辉 San.Luo

----- 原始邮件 -----
发件人:"Cheng, Hao" <[email protected]<mailto:[email protected]>>
收件人:"Wang, Daoyuan" <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Olivier Girardot 
<[email protected]<mailto:[email protected]>>, user 
<[email protected]<mailto:[email protected]>>
主题:Re: sparksql running slow while joining_2_tables.
日期:2015年05月05日 13点18分


I assume you’re using the DataFrame API within your application.



sql(“SELECT…”).explain(true)



From: Wang, Daoyuan
Sent: Tuesday, May 5, 2015 10:16 AM
To: [email protected]<mailto:[email protected]>; Cheng, Hao; Olivier 
Girardot; user
Subject: RE: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.



You can use

Explain extended select ….



From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]]
Sent: Tuesday, May 05, 2015 9:52 AM
To: Cheng, Hao; Olivier Girardot; user
Subject: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.



As I know broadcastjoin is automatically enabled by 
spark.sql.autoBroadcastJoinThreshold.

refer to 
http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options



and how to check my app's physical plan,and others things like optimized 
plan,executable plan.etc



thanks



--------------------------------



Thanks&amp;Best regards!
罗辉 San.Luo



----- 原始邮件 -----
发件人:"Cheng, Hao" <[email protected]<mailto:[email protected]>>
收件人:"Cheng, Hao" <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Olivier Girardot 
<[email protected]<mailto:[email protected]>>, user 
<[email protected]<mailto:[email protected]>>
主题:RE: 回复:Re: sparksql running slow while joining_2_tables.
日期:2015年05月05日 08点38分



Or, have you ever try broadcast join?



From: Cheng, Hao [mailto:[email protected]]
Sent: Tuesday, May 5, 2015 8:33 AM
To: [email protected]<mailto:[email protected]>; Olivier Girardot; user
Subject: RE: 回复:Re: sparksql running slow while joining 2 tables.



Can you print out the physical plan?



EXPLAIN SELECT xxx…



From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]]
Sent: Monday, May 4, 2015 9:08 PM
To: Olivier Girardot; user
Subject: 回复:Re: sparksql running slow while joining 2 tables.



hi Olivier

spark1.3.1, with java1.8.0.45

and add 2 pics .

it seems like a GC issue. I also tried with different parameters like memory 
size of driver&executor, memory fraction, java opts...

but this issue still happens.



--------------------------------



Thanks&amp;Best regards!
罗辉 San.Luo



----- 原始邮件 -----
发件人:Olivier Girardot <[email protected]<mailto:[email protected]>>
收件人:[email protected]<mailto:[email protected]>, user 
<[email protected]<mailto:[email protected]>>
主题:Re: sparksql running slow while joining 2 tables.
日期:2015年05月04日 20点46分



Hi,
What is you Spark version ?



Regards,



Olivier.



Le lun. 4 mai 2015 à 11:03, <[email protected]<mailto:[email protected]>> 
a écrit :

hi guys

        when i am running a sql  like "select 
a.name<http://a.name/>,a.startpoint,a.endpoint, a.piece from db a join sample b 
on (a.name<http://a.name/> = b.name<http://b.name/>) where (b.startpoint > 
a.startpoint + 25);" I found sparksql running slow in minutes which may caused 
by very long GC and shuffle time.



       table db is created from a txt file size at 56mb while table sample 
sized at 26mb, both at small size.

       my spark cluster is a standalone  pseudo-distributed spark cluster with 
8g executor and 4g driver manager.

       any advises? thank you guys.





--------------------------------



Thanks&amp;Best regards!
罗辉 San.Luo

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

Reply via email to