So you have 90GB total memory, and 24 total cores. Let's say you want to use 80% of all that memory (leaving memory for other components) so you have 72GB to use.
You want to take advantage of all the cores and memory. So this would be close: executor size = 6g number of executors = 12 cores per executor = 2 So 6gx12=72G 12x2=24 cores. In YARN, container = executor. JESSE CHEN Big Data Performance | IBM Analytics Office: 408 463 2296 Mobile: 408 828 9068 Email: jfc...@us.ibm.com From: Chadha Pooja <chadha.po...@bcg.com> To: "user@spark.apache.org" <user@spark.apache.org> Date: 03/03/2016 03:31 PM Subject: Configuring/Optimizing Spark Hi I am trying to understand the best parameter settings for processing a 12.5 GB file with my Spark Cluster. I am using a 3 node cluster, with 8 cores and 30 Gib of RAM on each node. I used Cloudera's top 5 mistakes articles and tried the following configurations: spark.executor.instances 6 spark.executor.cores 5 spark.executor.memory 15g Yarn.scheduler maximum allocation MB = 10000 Can someone please confirm whether these are correct? As an aside, I would like to better understand how YARN works in Spark – Could you help me with the difference between Container and Executor in YARN? Thanks! Pooja The Boston Consulting Group, Inc. This e-mail message may contain confidential and/or privileged information. If you are not an addressee or otherwise authorized to receive this message, you should not use, copy, disclose or take any action based on this e-mail or any information contained in the message. If you have received this material in error, please advise the sender immediately by reply e-mail and delete this message. Thank you.