Re: Determining number of executors within RDD

2015-06-10 Thread Nishkam Ravi
This PR adds support for multiple executors per worker: https://github.com/apache/spark/pull/731 and should be available in 1.4. Thanks, Nishkam On Wed, Jun 10, 2015 at 1:35 PM, Evo Eftimov wrote: > We/i were discussing STANDALONE mode, besides maxdml had already > summarized what is available

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Nishkam Ravi
t's since fixed? I'm on 1.0.1 and using 'yarn-cluster' as the > master. 'yarn-client' seems to pick up the values and works fine. > > Greg > > From: Nishkam Ravi > Date: Monday, September 22, 2014 3:30 PM > To: Greg > Cc: Andrew Or , &qu

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Nishkam Ravi
Greg, if you look carefully, the code is enforcing that the memoryOverhead be lower (and not higher) than spark.driver.memory. Thanks, Nishkam On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill wrote: > I thought I had this all figured out, but I'm getting some weird errors > now that I'm attempting t

Re: SparkSql is slow over yarn

2014-08-29 Thread Nishkam Ravi
Can you share more details about your job, cluster properties and configuration parameters? Thanks, Nishkam On Fri, Aug 29, 2014 at 11:33 AM, Chirag Aggarwal < chirag.aggar...@guavus.com> wrote: > When I run SparkSql over yarn, it runs 2-4 times slower as compared to > when its run in local mo

Re: Configuring Spark Memory

2014-07-23 Thread Nishkam Ravi
See if this helps: https://github.com/nishkamravi2/SparkAutoConfig/ It's a very simple tool for auto-configuring default parameters in Spark. Takes as input high-level parameters (like number of nodes, cores per node, memory per node, etc) and spits out default configuration, user advice and comm

Re: executor-cores vs. num-executors

2014-07-16 Thread Nishkam Ravi
I think two small JVMs would often beat a large one due to lower GC overhead.

Re: Spark on YARN performance

2014-04-18 Thread Nishkam Ravi
Spark-on-YARN takes 10-30 seconds of setup time for workloads like WordCount and PageRank on a small-sized cluster and thereafter performs as well as Spark standalone, as has been noted by Tom and Patrick. However, certain amount of configuration/tuning effort is required to match peak performance.