Hi,

 

On running yarn-client mode, the following options can be specified:

l  --executor-cores

l  --num-executors

 

If we have following machines:

l  3 data nodes

l  8 cores each node

 

Which is the better?

1.      --executor-cores 7 --num-executors 3  (more core for each executor,
leaving a few cores for other process)

2.      --executor-cores 2 -num-executors 12  (more executor)

 

I've thought that 1. could be better, since it can save some communication
overheads between tasks.

But in reality, 2. is faster by a large margin. (34 minutes vs 49 minutes)

 

Is there best practices for setting these options?

 

Thanks.

 

Reply via email to