subject:"Re\: input size too large \| Performance issues with Spark"

Re: input size too large | Performance issues with Spark

2015-04-05 Thread Ted Yu

Reading Sandy's blog, there seems to be one typo. bq. Similarly, the heap size can be controlled with the --executor-cores flag or thespark.executor.memory property. '--executor-memory' should be the right flag. BTW bq. It defaults to max(384, .07 * spark.executor.memory) Default memory overhead

Re: input size too large | Performance issues with Spark

2015-04-02 Thread Christian Perez

To Akhil's point, see Tuning Data structures. Avoid standard collection hashmap. With fewer machines, try running 4 or 5 cores per executor and only 3-4 executors (1 per node): http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/. Ought to reduce shuffle performance hit

Re: input size too large | Performance issues with Spark

2015-03-29 Thread Akhil Das

Go through this once, if you haven't read it already. https://spark.apache.org/docs/latest/tuning.html Thanks Best Regards On Sat, Mar 28, 2015 at 7:33 PM, nsareen wrote: > Hi All, > > I'm facing performance issues with spark implementation, and was briefly > investigating on WebUI logs, i noti