Re: Directory creation failed leads to job fail (should it?)

ayan guha Mon, 29 Jun 2015 08:57:53 -0700

It's a scheduler question. Spark will retry the task on the same worker.
>From spark standpoint data is not replicated because spark provides fault
tolerance but lineage not by replication.
On 30 Jun 2015 01:50, "Max Demoulin" <maxdemou...@gmail.com> wrote:


> The underlying issue is a filesystem corruption on the workers.
>
> In the case where I use hdfs, with a sufficient amount of replica, would
> Spark try to launch a task on another node where the block replica is
> present?
>
> Thanks :-)
>
> --
> Henri Maxime Demoulin
>
> 2015-06-29 9:10 GMT-04:00 ayan guha <guha.a...@gmail.com>:
>
>> No, spark can not do that as it does not replicate partitions (so no
>> retry on different worker). It seems your cluster is not provisioned with
>> correct permissions. I would suggest to automate node provisioning.
>>
>> On Mon, Jun 29, 2015 at 11:04 PM, maxdml <maxdemou...@gmail.com> wrote:
>>
>>> Hi there,
>>>
>>> I have some traces from my master and some workers where for some reason,
>>> the ./work directory of an application can not be created on the workers.
>>> There is also an issue with the master's temp directory creation.
>>>
>>> master logs: http://pastebin.com/v3NCzm0u
>>> worker's logs: http://pastebin.com/Ninkscnx
>>>
>>> It seems that some of the executors can create the directories, but as
>>> some
>>> others are repetitively failing, the job ends up failing. Shouldn't spark
>>> manage to keep working with a smallest number of executors instead of
>>> failing?
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Directory-creation-failed-leads-to-job-fail-should-it-tp23531.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Re: Directory creation failed leads to job fail (should it?)

Reply via email to