Hi all,

I am encountering the following error:

INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No
space left on device [duplicate 4]

For each slave, df -h looks roughtly like this, which makes the above error
surprising.

Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            7.9G  4.4G  3.5G  57% /
tmpfs                 7.4G  4.0K  7.4G   1% /dev/shm
/dev/xvdb              37G  3.3G   32G  10% /mnt
/dev/xvdf              37G  2.0G   34G   6% /mnt2
/dev/xvdv             500G   33M  500G   1% /vol

I'm on an EC2 cluster (c3.xlarge + 5 x m3) that I launched using the
spark-ec2 scripts and a clone of spark from today. The job I am running
closely resembles the collaborative filtering example
<https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html>.
This issue happens with the 1M version as well as the 10 million rating
version of the MovieLens dataset.

I have seen previous
<http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3c532f5aec.8060...@nengoiksvelzud.com%3E>
 questions
<https://groups.google.com/forum/#!msg/spark-users/Axx4optAj-E/q5lWMv-ZqnwJ>,
but they haven't helped yet. For example, I tried setting the Spark tmp
directory to the EBS volume at /vol/, both by editing the spark conf file
(and copy-dir'ing it to the slaves) as well as through the SparkConf. Yet I
still get the above error. Here is my current Spark config below. Note that
I'm launching via ~/spark/bin/spark-submit.

conf = SparkConf()
conf.setAppName("RecommendALS").set("spark.local.dir",
"/vol/").set("spark.executor.memory", "7g").set("spark.akka.frameSize",
"100").setExecutorEnv("SPARK_JAVA_OPTS", " -Dspark.akka.frameSize=100")
sc = SparkContext(conf=conf)

Thanks for any advice,
Chris

Reply via email to