Re: Are tachyon and akka removed from 2.1.1 please

2017-05-22 Thread Chin Wei Low
I think akka has been removed since 2.0. On 22 May 2017 10:19 pm, "Gene Pang" wrote: > Hi, > > Tachyon has been renamed to Alluxio. Here is the documentation for > running Alluxio with Spark > . > > Hope this helps, > Gene > >

Re: Spark app write too many small parquet files

2016-11-28 Thread Chin Wei Low
Try limit the partitions. spark.sql.shuffle.partitions This control the number of files generated. On 28 Nov 2016 8:29 p.m., "Kevin Tran" wrote: > Hi Denny, > Thank you for your inputs. I also use 128 MB but still too many files > generated by Spark app which is only ~14 KB each ! That's why I'

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-25 Thread Chin Wei Low
gt; > Best Regards, > Kazuaki Ishizaki > > > > From:Chin Wei Low > To:Kazuaki Ishizaki/Japan/IBM@IBMJP > Cc:user@spark.apache.org > Date:2016/10/10 11:33 > > Subject:Re: Spark SQL is slower when DataFrame is cache in Memory >

Re: [Spark] RDDs are not persisting in memory

2016-10-10 Thread Chin Wei Low
Hi, Your RDD is 5GB, perhaps it is too large to fit into executor's storage memory. You can refer to the Executors tab in Spark UI to check the available memory for storage for each of the executor. Regards, Chin Wei On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru wrote: > Hello team, > > Spa

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-09 Thread Chin Wei Low
;) > res.explain(true) > res.collect() > > Do I make some misunderstandings? > > Best Regards, > Kazuaki Ishizaki > > > > From:Chin Wei Low > To:Kazuaki Ishizaki/Japan/IBM@IBMJP > Cc:user@spark.apache.org > Date:

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-07 Thread Chin Wei Low
> Best Regards, > Kazuaki Ishizaki > > > > From:Chin Wei Low > To:user@spark.apache.org > Date:2016/10/07 13:05 > Subject:Spark SQL is slower when DataFrame is cache in Memory > -- > > > > Hi, >

Spark SQL is slower when DataFrame is cache in Memory

2016-10-06 Thread Chin Wei Low
Hi, I am using Spark 1.6.0. I have a Spark application that create and cache (in memory) DataFrames (around 50+, with some on single parquet file and some on folder with a few parquet files) with the following codes: val df = sqlContext.read.parquet df.persist df.count I union them to 3 DataFram