Apparently Spark Streaming 1.3.0 is not cleaning up its internal files and the worker nodes eventually run out of inodes. We see tons of old shuffle_*.data and *.index files that are never deleted. How do we get Spark to remove these files?
We have a simple standalone app with one RabbitMQ receiver and a two node cluster (2 x r3large AWS instances). Batch interval is 10 minutes after which we process data and write results to DB. No windowing or state mgmt is used. I've poured over the documentation and tried setting the following properties but they have not helped. As a work around we're using a cron script that periodically cleans up old files but this has a bad smell to it. SPARK_WORKER_OPTS in spark-env.sh on every worker node spark.worker.cleanup.enabled true spark.worker.cleanup.interval spark.worker.cleanup.appDataTtl Also tried on the driver side: spark.cleaner.ttl spark.shuffle.consolidateFiles true -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Worker-runs-out-of-inodes-tp22355.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org