Hi all, I am encountering the following error:
INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No space left on device [duplicate 4] For each slave, df -h looks roughtly like this, which makes the above error surprising. Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 4.4G 3.5G 57% / tmpfs 7.4G 4.0K 7.4G 1% /dev/shm /dev/xvdb 37G 3.3G 32G 10% /mnt /dev/xvdf 37G 2.0G 34G 6% /mnt2 /dev/xvdv 500G 33M 500G 1% /vol I'm on an EC2 cluster (c3.xlarge + 5 x m3) that I launched using the spark-ec2 scripts and a clone of spark from today. The job I am running closely resembles the collaborative filtering example <https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html>. This issue happens with the 1M version as well as the 10 million rating version of the MovieLens dataset. I have seen previous <http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3c532f5aec.8060...@nengoiksvelzud.com%3E> questions <https://groups.google.com/forum/#!msg/spark-users/Axx4optAj-E/q5lWMv-ZqnwJ>, but they haven't helped yet. For example, I tried setting the Spark tmp directory to the EBS volume at /vol/, both by editing the spark conf file (and copy-dir'ing it to the slaves) as well as through the SparkConf. Yet I still get the above error. Here is my current Spark config below. Note that I'm launching via ~/spark/bin/spark-submit. conf = SparkConf() conf.setAppName("RecommendALS").set("spark.local.dir", "/vol/").set("spark.executor.memory", "7g").set("spark.akka.frameSize", "100").setExecutorEnv("SPARK_JAVA_OPTS", " -Dspark.akka.frameSize=100") sc = SparkContext(conf=conf) Thanks for any advice, Chris