These files are created by shuffle and just some temp files. They are not necessary for checkpointing and only stored in your local temp directory. They will be stored in "/tmp" by default. You can use `spark.local.dir` to set the path if you find your "/tmp" doesn't have enough space.
Best Regards, Shixiong Zhu 2015-09-29 1:04 GMT+08:00 swetha <swethakasire...@gmail.com>: > > Hi, > > I see a lot of data getting filled locally as shown below from my streaming > job. I have my checkpoint set to hdfs. But, I still see the following data > filling my local nodes. Any idea if I can make this stored in hdfs instead > of storing the data locally? > > -rw-r--r-- 1 520 Sep 17 18:43 shuffle_23119_5_0.index > -rw-r--r-- 1 180564255 Sep 17 18:43 shuffle_23129_2_0.data > -rw-r--r-- 1 364850277 Sep 17 18:45 shuffle_23145_8_0.data > -rw-r--r-- 1 267583750 Sep 17 18:46 shuffle_23105_4_0.data > -rw-r--r-- 1 136178819 Sep 17 18:48 shuffle_23123_8_0.data > -rw-r--r-- 1 159931184 Sep 17 18:48 shuffle_23167_8_0.data > -rw-r--r-- 1 520 Sep 17 18:49 shuffle_23315_7_0.index > -rw-r--r-- 1 520 Sep 17 18:50 shuffle_23319_3_0.index > -rw-r--r-- 1 92240350 Sep 17 18:51 shuffle_23305_2_0.data > -rw-r--r-- 1 40380158 Sep 17 18:51 shuffle_23323_6_0.data > -rw-r--r-- 1 369653284 Sep 17 18:52 shuffle_23103_6_0.data > -rw-r--r-- 1 371932812 Sep 17 18:52 shuffle_23125_6_0.data > -rw-r--r-- 1 19857974 Sep 17 18:53 shuffle_23291_19_0.data > -rw-r--r-- 1 55342005 Sep 17 18:53 shuffle_23305_8_0.data > -rw-r--r-- 1 92920590 Sep 17 18:53 shuffle_23303_4_0.data > > > Thanks, > Swetha > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-job-filling-a-lot-of-data-in-local-spark-nodes-tp24846.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >