Re: Spark resilience

Ian Ferreira Mon, 14 Apr 2014 12:50:43 -0700

Thanks Aaron.

From:  Aaron Davidson <ilike...@gmail.com>
Reply-To:  <user@spark.apache.org>
Date:  Monday, April 14, 2014 at 10:30 AM
To:  <user@spark.apache.org>
Subject:  Re: Spark resilience

Master and slave are somewhat overloaded terms in the Spark ecosystem (see
the glossary: 
http://spark.apache.org/docs/latest/cluster-overview.html#glossary). Are you
actually asking about the Spark "driver" and "executors", or the standalone
cluster "master" and "workers"?

To briefly answer for either possibility:
(1) Drivers are not fault tolerant but can be restarted automatically,
Executors may be removed at any point without failing the job (though losing
an Executor may slow the job significantly), and Executors may be added at
any point and will be immediately used.
(2) Standalone cluster Masters are fault tolerant and failure will only
temporarily stall new jobs from starting or getting new resources, but does
not affect currently-running jobs. Workers can fail and will simply cause
jobs to lose their current Executors. New Workers can be added at any point.

On Mon, Apr 14, 2014 at 11:00 AM, Ian Ferreira <ianferre...@hotmail.com>
wrote:
> Folks,
> 
> I was wondering what the failure support modes where for Spark while running
> jobs
> 
> 1. What happens when a master fails
> 2. What happens when a slave fails
> 3. Can you mid job add and remove slaves
> 
> Regarding the install on Meso, if I understand correctly the Spark master is
> behind a Zookeeper quorum so that isolates the slaves from a master failure,
> but what about the masters behind quorum?
> 
> Cheers
> - Ian
>

Re: Spark resilience

Reply via email to