Re: Configuring/Optimizing Spark

Jesse F Chen Thu, 03 Mar 2016 17:39:06 -0800

So you have 90GB total memory, and 24 total cores.

Let's say you want to use 80% of all that memory (leaving memory for other
components) so you have 72GB to use.


You want to take advantage of all the cores and memory.

So this would be close:

executor size = 6g
number of executors = 12
cores per executor = 2

So 6gx12=72G
12x2=24 cores.

In YARN, container = executor.
                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                   JESSE CHEN                                   
                                                              
                                   Big Data Performance | IBM Analytics         
                                                              
                                                                                
                                                              
                                   Office:  408 463 2296                        
                                                              
                                   Mobile: 408 828 9068                         
                                                              
                                   Email:   jfc...@us.ibm.com                   
                                                              
                                                                                
                                                              
                                                                                
                                                              





From:   Chadha Pooja <chadha.po...@bcg.com>
To:     "user@spark.apache.org" <user@spark.apache.org>
Date:   03/03/2016 03:31 PM
Subject:        Configuring/Optimizing Spark



Hi

I am trying to understand the best parameter settings for processing a 12.5
GB file with my Spark Cluster. I am using a 3 node cluster, with 8 cores
and 30 Gib of RAM on each node.

I used Cloudera's top 5 mistakes articles and tried the following
configurations:
spark.executor.instances         6
spark.executor.cores             5
spark.executor.memory            15g
Yarn.scheduler maximum allocation MB = 10000

Can someone please confirm whether these are correct?

As an aside, I would like to better understand how YARN works in Spark –
Could you help me with the difference between Container and Executor in
YARN?

Thanks!
Pooja






The Boston Consulting Group, Inc.

This e-mail message may contain confidential and/or privileged information.
If you are not an addressee or otherwise authorized to receive this
message, you should not use, copy, disclose or take any action based on
this e-mail or any information contained in the message. If you have
received this material in error, please advise the sender immediately by
reply e-mail and delete this message. Thank you.

Re: Configuring/Optimizing Spark

Reply via email to