It's a scheduler question. Spark will retry the task on the same worker. >From spark standpoint data is not replicated because spark provides fault tolerance but lineage not by replication. On 30 Jun 2015 01:50, "Max Demoulin" <maxdemou...@gmail.com> wrote:
> The underlying issue is a filesystem corruption on the workers. > > In the case where I use hdfs, with a sufficient amount of replica, would > Spark try to launch a task on another node where the block replica is > present? > > Thanks :-) > > -- > Henri Maxime Demoulin > > 2015-06-29 9:10 GMT-04:00 ayan guha <guha.a...@gmail.com>: > >> No, spark can not do that as it does not replicate partitions (so no >> retry on different worker). It seems your cluster is not provisioned with >> correct permissions. I would suggest to automate node provisioning. >> >> On Mon, Jun 29, 2015 at 11:04 PM, maxdml <maxdemou...@gmail.com> wrote: >> >>> Hi there, >>> >>> I have some traces from my master and some workers where for some reason, >>> the ./work directory of an application can not be created on the workers. >>> There is also an issue with the master's temp directory creation. >>> >>> master logs: http://pastebin.com/v3NCzm0u >>> worker's logs: http://pastebin.com/Ninkscnx >>> >>> It seems that some of the executors can create the directories, but as >>> some >>> others are repetitively failing, the job ends up failing. Shouldn't spark >>> manage to keep working with a smallest number of executors instead of >>> failing? >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Directory-creation-failed-leads-to-job-fail-should-it-tp23531.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >