Hello,
I've been seeing the following errors when trying to save to S3:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage fail
ure: Task 4058 in stage 2.1 failed 4 times, most recent failure: Lost task
4058.3 in stag
e 2.1 (TID 12572, ip-10-81-151-40.ec2.internal):
java.io.FileNotFoundException: /mnt/spa$
k/spark-local-20140827191008-05ae/0c/shuffle_1_7570_5768 (No space left on
device)
java.io.FileOutputStream.open(Native Method)
java.io.FileOutputStream.<init>(FileOutputStream.java:221)
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:107)
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:175$
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuff$
eWriter.scala:67)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuff$
eWriter.scala:65)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65$
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
DF tells me there is plenty of space left on the worker node:
root@ip-10-81-151-40 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.9G 4.6G 3.3G 59% /
tmpfs 7.4G 0 7.4G 0% /dev/shm
/dev/xvdb 37G 11G 25G 30% /mnt
/dev/xvdf 37G 9.5G 26G 27% /mnt2
Any suggestions?
Dan