RE: The number of cores vs. the number of executors

innowireless TaeYun Kim Mon, 07 Jul 2014 19:04:38 -0700

For your information, I've attached the Ganglia monitoring screen capture on
the Stack Overflow question.

Please see:
http://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores
-vs-the-number-of-executors 

From: innowireless TaeYun Kim [mailto:taeyun....@innowireless.co.kr] 
Sent: Tuesday, July 08, 2014 9:43 AM
To: user@spark.apache.org
Subject: The number of cores vs. the number of executors

Hi,

I'm trying to understand the relationship of the number of cores and the
number of executors when running a Spark job on YARN.

The test environment is as follows:

- # of data nodes: 3

- Data node machine spec:

- CPU: Core i7-4790 (# of cores: 4, # of threads: 8)

- RAM: 32GB (8GB x 4)

- HDD: 8TB (2TB x 4)

- Network: 1Gb

- Spark job flow: sc.textFile -> filter -> map -> filter -> mapToPair ->
reduceByKey -> map -> saveAsTextFile

- input data

- type: single text file

- size: 165GB

- # of lines: 454,568,833

- output

- # of lines after second filter: 310,640,717

- # of lines of the result file: 99,848,268

- size of the result file: 41GB

The job was run with following configurations:

1) --master yarn-client --executor-memory 19G --executor-cores 7
--num-executors 3  (executors per data node, use as much as cores)

2) --master yarn-client --executor-memory 19G --executor-cores 4
--num-executors 3  (# of cores reduced)

3) --master yarn-client --executor-memory 4G --executor-cores 2
--num-executors 12  (less core, more executor)

- elapsed times:

1) 50 min 15 sec

2) 55 min 48 sec

3) 31 min 23 sec

To my surprise, 3) was much faster.

I thought that 1) would be faster, since there would be less inter-executor
communication when shuffling.

Although # of cores of 1) is fewer than 3), #of cores is not the key factor
since 2) did perform well.

How can I understand this result?

Thanks.

RE: The number of cores vs. the number of executors

Reply via email to