one executor runs multiple parallel tasks VS multiple excutors each runs one task

Xiaoye Sun Tue, 11 Oct 2016 13:50:41 -0700

Hi,

Currently, I am running Spark using the standalone scheduler with 3
machines in our cluster. For these three machines, one runs Spark Master
and the other two run Spark Worker.


We run a machine learning application on this small-scale testbed. A
particular stage in my application is divided into 10 parallel tasks. So I
want to know the pros and cons for different cluster configurations.

Conf 1: Multiple executors each of which runs one task.
Each worker has 5 executors; each of the executors has 1 CPU core. In such
configuration, the scheduler will give one task to each of the executors.
Each of the tasks probably runs in different JVMs.

Conf 2: One executor running multiple tasks.
Each worker has only one executor; each executor has 5 CPU cores. In such
case, the scheduler will give 5 tasks to each executor. Tasks running in
the same executor probably run in the same process but different threads.

I think in many cases, Conf 2 is preferable than Conf 1 since the tasks in
the same executor can share the block manager so data shared among these
tasks doesn't need to be transferred multiple times (e.g. the broadcast
data). However, I am wondering if there is a scenario where Conf 1 is
preferable and does the same conclusion hold when the scheduler is YARN or
Mesos.

Thanks!

Best,
Xiaoye

one executor runs multiple parallel tasks VS multiple excutors each runs one task

Reply via email to