Re:Re: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-13 Thread Todd
reduceLocality.enabled is the configuration of Spark, not >> Spark SQL. >> >> >> >> From: Todd [mailto:bit1...@163.com] >> Sent: Friday, September 11, 2015 3:39 PM >> To: Todd >> Cc: Cheng, Hao; Jesse F Chen; Michael Armbrust; user@spark.apache.org >> S

Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Davies Liu
> Davies Liu ---09/11/2015 10:41:23 AM---On Fri, Sep 11, 2015 at 10:31 AM, > Jesse F Chen wrote: > > > From: Davies Liu > To: Jesse F Chen/San Francisco/IBM@IBMUS > Cc: "Cheng, Hao" , Todd , Michael > Armbrust , "user@spark.apache.org" >

Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Jesse F Chen
ancisco/IBM@IBMUS Cc: "Cheng, Hao" , Todd , Michael Armbrust , "user@spark.apache.org" Date: 09/11/2015 10:41 AM Subject: Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL On Fri, Se

Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Davies Liu
n/San Francisco/IBM@IBMUS, Michael Armbrust > , "user@spark.apache.org" > Date: 09/11/2015 01:00 AM > Subject: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ > compared with spark 1.4.1 SQL > > > > > >

Re: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Davies Liu
; > From: Todd [mailto:bit1...@163.com] > Sent: Friday, September 11, 2015 3:39 PM > To: Todd > Cc: Cheng, Hao; Jesse F Chen; Michael Armbrust; user@spark.apache.org > Subject: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ > compared with spark 1.4.1 SQL > >

RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Jesse F Chen
l Armbrust , "user@spark.apache.org" Date: 09/11/2015 01:00 AM Subject:RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL Can you confirm if the query really run in the cluster mode? Not the local

RE:RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread prosp4300
1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL I add the following two options: spark.sql.planner.sortMergeJoin=false spark.shuffle.reduceLocality.enabled=false But it still performs the same as not setting them two. One thing is that on the spark ui, when I click the

RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Cheng, Hao
, September 11, 2015 3:39 PM To: Todd Cc: Cheng, Hao; Jesse F Chen; Michael Armbrust; user@spark.apache.org Subject: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL I add the following two options: spark.sql.planner.sortMergeJoin=false

Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
om] Sent: Friday, September 11, 2015 2:17 PM To: Cheng, Hao Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL Thanks Hao for the reply. I turn the merge sort join off, the physical plan is

Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
rg Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL Thanks Hao for the reply. I turn the merge sort join off, the physical plan is below, but the performance is roughly the same as it on... == Physical Plan == TungstenProject [ss_quantity#10,ss_lis

Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
brust Cc: Todd; user@spark.apache.org Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL Could this be a build issue (i.e., sbt package)? If I ran the same jar build for 1.4.1 in 1.5, I am seeing large regression too in queries (all other things identical)... I am c

RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Cheng, Hao
. From: Todd [mailto:bit1...@163.com] Sent: Friday, September 11, 2015 2:17 PM To: Cheng, Hao Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL Thanks Hao for the reply. I turn the merge sort join off

Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
apache.org Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL Could this be a build issue (i.e., sbt package)? If I ran the same jar build for 1.4.1 in 1.5, I am seeing large regression too in queries (all other things identical)... I am curious, to build 1.

RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Cheng, Hao
@spark.apache.org Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL Could this be a build issue (i.e., sbt package)? If I ran the same jar build for 1.4.1 in 1.5, I am seeing large regression too in queries (all other things identical)... I am curious, to build 1.5 (when

Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Jesse F Chen
ecial parameters i should be using to make sure I load the latest hive dependencies? From: Michael Armbrust To: Todd Cc: "user@spark.apache.org" Date: 09/10/2015 11:07 AM Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.

Re:Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
Thanks Michael for the reply. Below is the sql plan for 1.5 and 1.4. 1.5 is using SortMergeJoin, while 1.4.1 is using shuffled hash join. In this case, it seems hash join performs better than sort join.

Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Michael Armbrust
I've been running TPC-DS SF=1500 daily on Spark 1.4.1 and Spark 1.5 on S3, so this is surprising. In my experiments Spark 1.5 is either the same or faster than 1.4 with only small exceptions. A few thoughts, - 600 partitions is probably way too many for 6G of data. - Providing the output of ex

spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
Hi, I am using data generated with sparksqlperf(https://github.com/databricks/spark-sql-perf) to test the spark sql performance (spark on yarn, with 10 nodes) with the following code (The table store_sales is about 90 million records, 6G in size) val outputDir="hdfs://tmp/spark_perf/scaleFact