Re: Blockmgr directories intermittently not being cleaned up

tBoyle Wed, 20 Jun 2018 10:27:08 -0700

I'm experiencing the same behaviour with shuffle data being orphaned on disk
(Spark 2.0.1 with Spark streaming).


We are using AWS R4 EC2 instances with 300GB EBS volumes attached, most
spilled shuffle data is eventually cleaned up by the ContextCleaner within
10 minutes. We do not use the external shuffle service and also use mesos. 

Occasionally some shuffle files are never removed until the application is
gracefully shutdown or dies due to lack of disk space. I am confident the
orphaned shuffle data is not in use by any jobs after 5 minutes (batch
duration). Did you know of any possible causes of this shuffle data not
being cleaned and left orphaned on the disk?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Blockmgr directories intermittently not being cleaned up

Reply via email to