Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-27 Thread Kazuaki Ishizaki
: user Date: 2016/10/25 17:33 Subject:Re: Spark SQL is slower when DataFrame is cache in Memory Hi Kazuaki, I print a debug log right before I call the collect, and use that to compare against the job start log (it is available when turning on debug log). Anyway, I test that in

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-25 Thread Chin Wei Low
gt; > Best Regards, > Kazuaki Ishizaki > > > > From:Chin Wei Low > To:Kazuaki Ishizaki/Japan/IBM@IBMJP > Cc:user@spark.apache.org > Date:2016/10/10 11:33 > > Subject: Re: Spark SQL is slower when DataFrame is cache in Memory >

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-24 Thread Kazuaki Ishizaki
:Re: Spark SQL is slower when DataFrame is cache in Memory Hi Ishizaki san, Thanks for the reply. So, when I pre-cache the dataframe, the cache is being used during the job execution. Actually there are 3 events: 1. call res.collect 2. job started 3. job completed I am concerning

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-09 Thread Chin Wei Low
;) > res.explain(true) > res.collect() > > Do I make some misunderstandings? > > Best Regards, > Kazuaki Ishizaki > > > > From:Chin Wei Low > To:Kazuaki Ishizaki/Japan/IBM@IBMJP > Cc: user@spark.apache.org > Date:

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-07 Thread Kazuaki Ishizaki
e.org Date: 2016/10/07 20:06 Subject: Re: Spark SQL is slower when DataFrame is cache in Memory Hi Ishizaki san, So there is a gap between res.collect and when I see this log: spark.SparkContext: Starting job: collect at :26 What you mean is, during this time Spark already start to

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-07 Thread Chin Wei Low
> Best Regards, > Kazuaki Ishizaki > > > > From:Chin Wei Low > To:user@spark.apache.org > Date: 2016/10/07 13:05 > Subject:Spark SQL is slower when DataFrame is cache in Memory > -- > > > > Hi, >

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-07 Thread Kazuaki Ishizaki
:Spark SQL is slower when DataFrame is cache in Memory Hi, I am using Spark 1.6.0. I have a Spark application that create and cache (in memory) DataFrames (around 50+, with some on single parquet file and some on folder with a few parquet files) with the following codes: val df

Spark SQL is slower when DataFrame is cache in Memory

2016-10-06 Thread Chin Wei Low
Hi, I am using Spark 1.6.0. I have a Spark application that create and cache (in memory) DataFrames (around 50+, with some on single parquet file and some on folder with a few parquet files) with the following codes: val df = sqlContext.read.parquet df.persist df.count I union them to 3 DataFram