Re: Spark streaming job filling a lot of data in local spark nodes

Shixiong Zhu Mon, 28 Sep 2015 19:03:21 -0700

These files are created by shuffle and just some temp files. They are not
necessary for checkpointing and only stored in your local temp directory.
They will be stored in "/tmp" by default. You can use `spark.local.dir` to
set the path if you find your "/tmp" doesn't have enough space.


Best Regards,
Shixiong Zhu

2015-09-29 1:04 GMT+08:00 swetha <swethakasire...@gmail.com>:

>
> Hi,
>
> I see a lot of data getting filled locally as shown below from my streaming
> job. I have my checkpoint set to hdfs. But, I still see the following data
> filling my local nodes. Any idea if I can make this stored in hdfs instead
> of storing the data locally?
>
> -rw-r--r--  1        520 Sep 17 18:43 shuffle_23119_5_0.index
> -rw-r--r--  1 180564255 Sep 17 18:43 shuffle_23129_2_0.data
> -rw-r--r--  1 364850277 Sep 17 18:45 shuffle_23145_8_0.data
> -rw-r--r--  1  267583750 Sep 17 18:46 shuffle_23105_4_0.data
> -rw-r--r--  1  136178819 Sep 17 18:48 shuffle_23123_8_0.data
> -rw-r--r--  1  159931184 Sep 17 18:48 shuffle_23167_8_0.data
> -rw-r--r--  1        520 Sep 17 18:49 shuffle_23315_7_0.index
> -rw-r--r--  1        520 Sep 17 18:50 shuffle_23319_3_0.index
> -rw-r--r--  1   92240350 Sep 17 18:51 shuffle_23305_2_0.data
> -rw-r--r--  1   40380158 Sep 17 18:51 shuffle_23323_6_0.data
> -rw-r--r--  1  369653284 Sep 17 18:52 shuffle_23103_6_0.data
> -rw-r--r--  1  371932812 Sep 17 18:52 shuffle_23125_6_0.data
> -rw-r--r--  1   19857974 Sep 17 18:53 shuffle_23291_19_0.data
> -rw-r--r--  1  55342005 Sep 17 18:53 shuffle_23305_8_0.data
> -rw-r--r--  1   92920590 Sep 17 18:53 shuffle_23303_4_0.data
>
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-job-filling-a-lot-of-data-in-local-spark-nodes-tp24846.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark streaming job filling a lot of data in local spark nodes

Reply via email to