Re: Spark Performance on Yarn

2015-04-22 Thread Neelesh Salian
Does it still hit the memory limit for the container? An expensive transformation? On Wed, Apr 22, 2015 at 8:45 AM, Ted Yu wrote: > In master branch, overhead is now 10%. > That would be 500 MB > > FYI > > > > > On Apr 22, 2015, at 8:26 AM, nsalian wrote: > > > > +1 to executor-memory to 5g. >

Re: Spark Performance on Yarn

2015-04-22 Thread Ted Yu
In master branch, overhead is now 10%. That would be 500 MB FYI > On Apr 22, 2015, at 8:26 AM, nsalian wrote: > > +1 to executor-memory to 5g. > Do check the overhead space for both the driver and the executor as per > Wilfred's suggestion. > > Typically, 384 MB should suffice. > > > >

Re: Spark Performance on Yarn

2015-04-22 Thread nsalian
+1 to executor-memory to 5g. Do check the overhead space for both the driver and the executor as per Wilfred's suggestion. Typically, 384 MB should suffice. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22610.html Sent fr

Re: Spark Performance on Yarn

2015-04-21 Thread hnahak
Try --executor-memory 5g , because you have 8 gb RAM in each machine -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22603.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Spark Performance on Yarn

2015-04-20 Thread Peng Cheng
I got exactly the same problem, except that I'm running on a standalone master. Can you tell me the counterpart parameter on standalone master for increasing the same memroy overhead? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-

Re: Spark Performance on Yarn

2015-02-23 Thread Lee Bierman
Thanks for the suggestions. I removed the "persist" call from program. Doing so I started it with: spark-submit --class com.xxx.analytics.spark.AnalyticsJob --master yarn /tmp/analytics.jar --input_directory hdfs://ip:8020/flume/events/2015/02/ This takes all the default and only runs 2 execut

Re: Spark Performance on Yarn

2015-02-21 Thread Davies Liu
How many executors you have per machine? It will be helpful if you could list all the configs. Could you also try to run it without persist? Caching do hurt than help, if you don't have enough memory. On Fri, Feb 20, 2015 at 5:18 PM, Lee Bierman wrote: > Thanks for the suggestions. > I'm experim

Re: Spark Performance on Yarn

2015-02-20 Thread Lee Bierman
Thanks for the suggestions. I'm experimenting with different values for spark memoryOverhead and explictly giving the executors more memory, but still have not found the golden medium to get it to finish in a proper time frame. Is my cluster massively undersized at 5 boxes, 8gb 2cpu ? Trying to fi

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
That's all correct. -Sandy On Fri, Feb 20, 2015 at 1:23 PM, Kelvin Chu <2dot7kel...@gmail.com> wrote: > Hi Sandy, > > I appreciate your clear explanation. Let me try again. It's the best way > to confirm I understand. > > spark.executor.memory + spark.yarn.executor.memoryOverhead = the memory >

Re: Spark Performance on Yarn

2015-02-20 Thread Kelvin Chu
Hi Sandy, I appreciate your clear explanation. Let me try again. It's the best way to confirm I understand. spark.executor.memory + spark.yarn.executor.memoryOverhead = the memory that YARN will create a JVM spark.executor.memory = the memory I can actually use in my jvm application = part of it

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
Hi Kelvin, spark.executor.memory controls the size of the executor heaps. spark.yarn.executor.memoryOverhead is the amount of memory to request from YARN beyond the heap size. This accounts for the fact that JVMs use some non-heap memory. The Spark heap is divided into spark.storage.memoryFract

Re: Spark Performance on Yarn

2015-02-20 Thread Kelvin Chu
Hi Sandy, I am also doing memory tuning on YARN. Just want to confirm, is it correct to say: spark.executor.memory - spark.yarn.executor.memoryOverhead = the memory I can actually use in my jvm application If it is not, what is the correct relationship? Any other variables or config parameters i

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
If that's the error you're hitting, the fix is to boost spark.yarn.executor.memoryOverhead, which will put some extra room in between the executor heap sizes and the amount of memory requested for them from YARN. -Sandy On Fri, Feb 20, 2015 at 9:40 AM, lbierman wrote: > A bit more context on th

Re: Spark Performance on Yarn

2015-02-20 Thread lbierman
A bit more context on this issue. From the container logs on the executor Given my cluster specs above what would be appropriate parameters to pass into : --num-executors --num-cores --executor-memory I had tried it with --executor-memory 2500MB 015-02-20 06:50:09,056 WARN org.apache.hadoop.ya

Re: Spark Performance on Yarn

2015-02-20 Thread Sandy Ryza
Are you specifying the executor memory, cores, or number of executors anywhere? If not, you won't be taking advantage of the full resources on the cluster. -Sandy On Fri, Feb 20, 2015 at 2:41 AM, Sean Owen wrote: > None of this really points to the problem. These indicate that workers > died b

Re: Spark Performance on Yarn

2015-02-20 Thread Sean Owen
None of this really points to the problem. These indicate that workers died but not why. I'd first go locate executor logs that reveal more about what's happening. It sounds like a hard-er type of failure, like JVM crash or running out of file handles, or GC thrashing. On Fri, Feb 20, 2015 at 4:51