Hi All,
Have anyone ran into the same problem? By looking at the source code in 
official release (rc11),this property settings is set to false by default, 
however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it 
to fill up the disk pretty fast since SparkContext deploys the fat JAR file 
(~115MB) every time for each job and it is not cleaned up.








yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:
      val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", 
"false").toBoolean
[test@spark ~]$ hdfs dfs -ls .sparkStagingFound 46 itemsdrwx------   - test 
users          0 2014-05-01 01:42 
.sparkStaging/application_1398370455828_0050drwx------   - test users          
0 2014-05-01 02:03 .sparkStaging/application_1398370455828_0051drwx------   - 
test users          0 2014-05-01 02:04 
.sparkStaging/application_1398370455828_0052drwx------   - test users          
0 2014-05-01 05:44 .sparkStaging/application_1398370455828_0053drwx------   - 
test users          0 2014-05-01 05:45 
.sparkStaging/application_1398370455828_0055drwx------   - test users          
0 2014-05-01 05:46 .sparkStaging/application_1398370455828_0056drwx------   - 
test users          0 2014-05-01 05:49 
.sparkStaging/application_1398370455828_0057drwx------   - test users          
0 2014-05-01 05:52 .sparkStaging/application_1398370455828_0058drwx------   - 
test users          0 2014-05-01 05:58 
.sparkStaging/application_1398370455828_0059drwx------   - test users          
0 2014-05-01 07:38 .sparkStaging/application_1398370455828_0060drwx------   - 
test users          0 2014-05-01 07:41 
.sparkStaging/application_1398370455828_0061….drwx------   - test users         
 0 2014-06-16 14:45 .sparkStaging/application_1402001910637_0131drwx------   - 
test users          0 2014-06-16 15:03 
.sparkStaging/application_1402001910637_0135drwx------   - test users          
0 2014-06-16 15:16 .sparkStaging/application_1402001910637_0136drwx------   - 
test users          0 2014-06-16 15:46 
.sparkStaging/application_1402001910637_0138drwx------   - test users          
0 2014-06-16 23:57 .sparkStaging/application_1402001910637_0157drwx------   - 
test users          0 2014-06-17 05:55 
.sparkStaging/application_1402001910637_0161
Is this something that needs to be explicitly set in 
:SPARK_YARN_USER_ENV="spark.yarn.preserve.staging.files=false"
http://spark.apache.org/docs/latest/running-on-yarn.htmlspark.yarn.preserve.staging.filesfalseSet
 to true to preserve the staged files (Spark jar, app jar, distributed cache 
files) at the end of the job rather then delete them.or this is a bug that is 
not honoring the default value and is override to true somewhere?
Thanks.


                                          

Reply via email to