Hi Oleg, Did you ever figure this out? I'm observing the same exception also in 0.9.1 and think it might be related to setting spark.speculation=true. My theory is that multiple attempts at the same task start, the first finishes and cleans up the _temporary directory, and then the second fails because the _temporary is no longer there.
Thanks! Andrew On Mon, Jun 9, 2014 at 1:19 PM, Oleg Proudnikov <oleg.proudni...@gmail.com> wrote: > Hi All, > > After a few simple transformations I am trying to save to a local file > system. The code works in local mode but not on a standalone cluster. The > directory *10000.txt/_temporary* does exist after the exception. > > I would appreciate any suggestions. > > > *scala> d3.sample(false,0.01,1).map( pair => pair._2 > ).saveAsTextFile("10000.txt")* > > > 14/06/09 22:06:40 ERROR TaskSetManager: Task 0.0:0 failed 4 times; > aborting job > *org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times > (most recent failure: Exception failure: java.io.IOException: The temporary > job-output directory > file:/data/spark-0.9.1-bin-hadoop1/10000.txt/_temporary doesn't exist!)* > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > Thank you, > Oleg > >