Re: Spark resilience

Arpit Tak Tue, 15 Apr 2014 23:19:32 -0700

1. If we add more executors to cluster and data is already cached inside
system(rdds are already there) . so, in that case
those executors will run job on new executors or not , as rdd are not
present there??
if yes, then how the performance on new executors ??


2. What is the replication factor in spark in memory (as for hadoop default
is 3 ) and can we change for spark also ??




On Tue, Apr 15, 2014 at 9:53 PM, Manoj Samel <manojsamelt...@gmail.com>wrote:

> Thanks Aaron, this is useful !
>
> - Manoj
>
>
> On Mon, Apr 14, 2014 at 8:12 PM, Aaron Davidson <ilike...@gmail.com>wrote:
>
>> Launching drivers inside the cluster was a feature added in 0.9, for
>> standalone cluster mode:
>> http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster
>>
>> Note the "supervise" flag, which will cause the driver to be restarted if
>> it fails. This is a rather low-level mechanism which by default will just
>> cause the whole job to rerun from the beginning. Special recovery would
>> have to be implemented by hand, via some sort of state checkpointing into a
>> globally visible storage system (e.g., HDFS), which, for example, Spark
>> Streaming already does.
>>
>> Currently, this feature is not supported in YARN or Mesos fine-grained
>> mode.
>>
>>
>> On Mon, Apr 14, 2014 at 2:08 PM, Manoj Samel <manojsamelt...@gmail.com>wrote:
>>
>>> Could you please elaborate how drivers can be restarted automatically ?
>>>
>>> Thanks,
>>>
>>>
>>> On Mon, Apr 14, 2014 at 10:30 AM, Aaron Davidson <ilike...@gmail.com>wrote:
>>>
>>>> Master and slave are somewhat overloaded terms in the Spark ecosystem
>>>> (see the glossary:
>>>> http://spark.apache.org/docs/latest/cluster-overview.html#glossary).
>>>> Are you actually asking about the Spark "driver" and "executors", or the
>>>> standalone cluster "master" and "workers"?
>>>>
>>>> To briefly answer for either possibility:
>>>> (1) Drivers are not fault tolerant but can be restarted automatically,
>>>> Executors may be removed at any point without failing the job (though
>>>> losing an Executor may slow the job significantly), and Executors may be
>>>> added at any point and will be immediately used.
>>>> (2) Standalone cluster Masters are fault tolerant and failure will only
>>>> temporarily stall new jobs from starting or getting new resources, but does
>>>> not affect currently-running jobs. Workers can fail and will simply cause
>>>> jobs to lose their current Executors. New Workers can be added at any 
>>>> point.
>>>>
>>>>
>>>>
>>>> On Mon, Apr 14, 2014 at 11:00 AM, Ian Ferreira <ianferre...@hotmail.com
>>>> > wrote:
>>>>
>>>>> Folks,
>>>>>
>>>>> I was wondering what the failure support modes where for Spark while
>>>>> running jobs
>>>>>
>>>>>
>>>>>    1. What happens when a master fails
>>>>>    2. What happens when a slave fails
>>>>>    3. Can you mid job add and remove slaves
>>>>>
>>>>>
>>>>> Regarding the install on Meso, if I understand correctly the Spark
>>>>> master is behind a Zookeeper quorum so that isolates the slaves from a
>>>>> master failure, but what about the masters behind quorum?
>>>>>
>>>>> Cheers
>>>>> - Ian
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark resilience

Reply via email to