56mb / 26mb is very small size, do you observe data skew? More precisely, many records with the same chrname / name? And can you also double check the jvm settings for the executor process?
From: [email protected] [mailto:[email protected]] Sent: Tuesday, May 5, 2015 7:50 PM To: Cheng, Hao; Wang, Daoyuan; Olivier Girardot; user Subject: 回复:Re: sparksql running slow while joining_2_tables. Hi guys, attache the pic of physical plan and logs.Thanks. -------------------------------- Thanks&Best regards! 罗辉 San.Luo ----- 原始邮件 ----- 发件人:"Cheng, Hao" <[email protected]<mailto:[email protected]>> 收件人:"Wang, Daoyuan" <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, Olivier Girardot <[email protected]<mailto:[email protected]>>, user <[email protected]<mailto:[email protected]>> 主题:Re: sparksql running slow while joining_2_tables. 日期:2015年05月05日 13点18分 I assume you’re using the DataFrame API within your application. sql(“SELECT…”).explain(true) From: Wang, Daoyuan Sent: Tuesday, May 5, 2015 10:16 AM To: [email protected]<mailto:[email protected]>; Cheng, Hao; Olivier Girardot; user Subject: RE: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables. You can use Explain extended select …. From: [email protected]<mailto:[email protected]> [mailto:[email protected]] Sent: Tuesday, May 05, 2015 9:52 AM To: Cheng, Hao; Olivier Girardot; user Subject: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables. As I know broadcastjoin is automatically enabled by spark.sql.autoBroadcastJoinThreshold. refer to http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options and how to check my app's physical plan,and others things like optimized plan,executable plan.etc thanks -------------------------------- Thanks&Best regards! 罗辉 San.Luo ----- 原始邮件 ----- 发件人:"Cheng, Hao" <[email protected]<mailto:[email protected]>> 收件人:"Cheng, Hao" <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, Olivier Girardot <[email protected]<mailto:[email protected]>>, user <[email protected]<mailto:[email protected]>> 主题:RE: 回复:Re: sparksql running slow while joining_2_tables. 日期:2015年05月05日 08点38分 Or, have you ever try broadcast join? From: Cheng, Hao [mailto:[email protected]] Sent: Tuesday, May 5, 2015 8:33 AM To: [email protected]<mailto:[email protected]>; Olivier Girardot; user Subject: RE: 回复:Re: sparksql running slow while joining 2 tables. Can you print out the physical plan? EXPLAIN SELECT xxx… From: [email protected]<mailto:[email protected]> [mailto:[email protected]] Sent: Monday, May 4, 2015 9:08 PM To: Olivier Girardot; user Subject: 回复:Re: sparksql running slow while joining 2 tables. hi Olivier spark1.3.1, with java1.8.0.45 and add 2 pics . it seems like a GC issue. I also tried with different parameters like memory size of driver&executor, memory fraction, java opts... but this issue still happens. -------------------------------- Thanks&Best regards! 罗辉 San.Luo ----- 原始邮件 ----- 发件人:Olivier Girardot <[email protected]<mailto:[email protected]>> 收件人:[email protected]<mailto:[email protected]>, user <[email protected]<mailto:[email protected]>> 主题:Re: sparksql running slow while joining 2 tables. 日期:2015年05月04日 20点46分 Hi, What is you Spark version ? Regards, Olivier. Le lun. 4 mai 2015 à 11:03, <[email protected]<mailto:[email protected]>> a écrit : hi guys when i am running a sql like "select a.name<http://a.name/>,a.startpoint,a.endpoint, a.piece from db a join sample b on (a.name<http://a.name/> = b.name<http://b.name/>) where (b.startpoint > a.startpoint + 25);" I found sparksql running slow in minutes which may caused by very long GC and shuffle time. table db is created from a txt file size at 56mb while table sample sized at 26mb, both at small size. my spark cluster is a standalone pseudo-distributed spark cluster with 8g executor and 4g driver manager. any advises? thank you guys. -------------------------------- Thanks&Best regards! 罗辉 San.Luo --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]>
