Re: spark.cleaner.ttl for 1.4.1

2015-11-30 Thread Josh Rosen
. On Mon, Nov 30, 2015 at 8:46 AM Michal Čizmazia wrote: > Does *spark.cleaner.ttl *still need to be used for Spark *1.4.1 *long-running > streaming jobs? Or does *ContextCleaner* alone do all the cleaning? >

spark.cleaner.ttl for 1.4.1

2015-11-30 Thread Michal Čizmazia
Does *spark.cleaner.ttl *still need to be used for Spark *1.4.1 *long-running streaming jobs? Or does *ContextCleaner* alone do all the cleaning?

RE: [SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl"

2015-06-09 Thread Shao, Saisai
: Tuesday, June 9, 2015 5:28 PM To: Shao, Saisai; user Subject: RE: [SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl" Jerry, I agree with you. However, in my case, I kept the monitoring the "blockmanager" folder. I do see sometimes the number of file

RE: [SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl"

2015-06-09 Thread Haopu Wang
deleted somehow. -Original Message- From: Shao, Saisai [mailto:saisai.s...@intel.com] Sent: Tuesday, June 09, 2015 4:33 PM To: Haopu Wang; user Subject: RE: [SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl" >From the stack I think this problem m

RE: [SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl"

2015-06-09 Thread Shao, Saisai
>From the stack I think this problem may be due to the deletion of broadcast >variable, as you set the spark.cleaner.ttl, so after this timeout limit, the >old broadcast variable will be deleted, you will meet this exception when you >want to use it again after that time limit.

Re: [SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl"

2015-06-09 Thread Benjamin Fradet
Hi, Are you restarting your Spark streaming context through getOrCreate? On 9 Jun 2015 09:30, "Haopu Wang" wrote: > When I ran a spark streaming application longer, I noticed the local > directory's size was kept increasing. > > I set "spark.cleaner.ttl&quo

[SparkStreaming 1.3.0] Broadcast failure after setting "spark.cleaner.ttl"

2015-06-09 Thread Haopu Wang
When I ran a spark streaming application longer, I noticed the local directory's size was kept increasing. I set "spark.cleaner.ttl" to 1800 seconds in order clean the metadata. The spark streaming batch duration is 10 seconds and checkpoint duration is 10 minutes. The setting

spark.cleaner.ttl

2014-10-01 Thread SK
Hi, I am using spark v 1.1.0. The default value of spark.cleaner.ttl is infinite as per the online docs. Since a lot of shuffle files are generated in /tmp/spark-local* and the disk is running out of space, we tested with a smaller value of ttl. However, even when job has completed and the timer

Re: spark.cleaner.ttl and spark.streaming.unpersist

2014-09-10 Thread Tim Smith
a heap of 30G, roughly ~16G is taken by >>> "[B" which is byte arrays. >>> >>> Still investigating more and would appreciate pointers for >>> troubleshooting. I have dumped the heap of a receiver and will try to go >>> over it. >>> >

Re: spark.cleaner.ttl and spark.streaming.unpersist

2014-09-10 Thread Tim Smith
omehow missed that parameter when I was reviewing the documentation, >>> that should do the trick! Thank you! >>> >>> 2014-09-10 2:10 GMT+01:00 Shao, Saisai : >>> >>>> Hi Luis, >>>> >>>> >>>> >>>> The parameter “s

Re: spark.cleaner.ttl and spark.streaming.unpersist

2014-09-10 Thread Yana Kadiyska
, >> that should do the trick! Thank you! >> >> 2014-09-10 2:10 GMT+01:00 Shao, Saisai : >> >> Hi Luis, >>> >>> >>> >>> The parameter “spark.cleaner.ttl” and “spark.streaming.unpersist” can be >>> used to remove useless timeo

Re: spark.cleaner.ttl and spark.streaming.unpersist

2014-09-10 Thread Tim Smith
0, 2014 at 1:43 AM, Luis Ángel Vicente Sánchez < langel.gro...@gmail.com> wrote: > I somehow missed that parameter when I was reviewing the documentation, > that should do the trick! Thank you! > > 2014-09-10 2:10 GMT+01:00 Shao, Saisai : > > Hi Luis, >> >> >>

Re: spark.cleaner.ttl and spark.streaming.unpersist

2014-09-10 Thread Luis Ángel Vicente Sánchez
I somehow missed that parameter when I was reviewing the documentation, that should do the trick! Thank you! 2014-09-10 2:10 GMT+01:00 Shao, Saisai : > Hi Luis, > > > > The parameter “spark.cleaner.ttl” and “spark.streaming.unpersist” can be > used to remove useless timeout s

RE: spark.cleaner.ttl and spark.streaming.unpersist

2014-09-09 Thread Shao, Saisai
Hi Luis, The parameter “spark.cleaner.ttl” and “spark.streaming.unpersist” can be used to remove useless timeout streaming data, the difference is that “spark.cleaner.ttl” is time-based cleaner, it does not only clean streaming input data, but also Spark’s useless metadata; while

spark.cleaner.ttl and spark.streaming.unpersist

2014-09-09 Thread Luis Ángel Vicente Sánchez
ed to use spark.cleaner.ttl and spark.streaming.unpersist together to mitigate that problem. And I also wonder if new RDD are being batched while a RDD is being processed. Regards, Luis

Re: "spark.streaming.unpersist" and "spark.cleaner.ttl"

2014-07-26 Thread Tathagata Das
be precisely aligned with latest code, so the best > way is to check the code. > > -Original Message- > From: Haopu Wang [mailto:hw...@qilinsoft.com] > Sent: Wednesday, July 23, 2014 5:56 PM > To: user@spark.apache.org > Subject: RE: "spark.streaming.unpersist"

RE: "spark.streaming.unpersist" and "spark.cleaner.ttl"

2014-07-23 Thread Shao, Saisai
" and "spark.cleaner.ttl" Jerry, thanks for the response. For the default storage level of DStream, it looks like Spark's document is wrong. In this link: http://spark.apache.org/docs/latest/streaming-programming-guide.html#memory-tuning It mentions: "Default persistence level

RE: "spark.streaming.unpersist" and "spark.cleaner.ttl"

2014-07-23 Thread Haopu Wang
." I will take a look at DStream.scala although I have no Scala experience. -Original Message- From: Shao, Saisai [mailto:saisai.s...@intel.com] Sent: 2014年7月23日 15:13 To: user@spark.apache.org Subject: RE: "spark.streaming.unpersist" and "spark.cleaner.ttl" Hi Haop

RE: "spark.streaming.unpersist" and "spark.cleaner.ttl"

2014-07-23 Thread Shao, Saisai
Hi Haopu, Please see the inline comments. Thanks Jerry -Original Message- From: Haopu Wang [mailto:hw...@qilinsoft.com] Sent: Wednesday, July 23, 2014 3:00 PM To: user@spark.apache.org Subject: "spark.streaming.unpersist" and "spark.cleaner.ttl" I have a DStream r

"spark.streaming.unpersist" and "spark.cleaner.ttl"

2014-07-23 Thread Haopu Wang
I have a DStream receiving data from a socket. I'm using local mode. I set "spark.streaming.unpersist" to "false" and leave " spark.cleaner.ttl" to be infinite. I can see files for input and shuffle blocks under "spark.local.dir" folder and the si

Re: is spark.cleaner.ttl safe?

2014-03-11 Thread Sourav Chandra
Yes, we are also facing same problem. The workaround we came up with is - store the broadcast variable id when it was first created - then create a scheduled job which runs every (spark.cleaner.ttl - 1minute) interval and creates the same broadcast variable using same id. This way spark is

Re: is spark.cleaner.ttl safe?

2014-03-11 Thread Aaron Davidson
And to answer your original question, spark.cleaner.ttl is not safe for the exact reason you brought up. The PR Mark linked intends to provide a much cleaner (and safer) solution. On Tue, Mar 11, 2014 at 2:01 PM, Mark Hamstra wrote: > Actually, TD's work-in-progress is probably more

Re: is spark.cleaner.ttl safe?

2014-03-11 Thread Mark Hamstra
stem with limited disk space. I believe there's enough > space if spark would clean up unused data from previous iterations, but as > it stands the number of iterations I can run is limited by available disk > space. > > I found a thread on the usage of spark.cleaner.ttl on the old

is spark.cleaner.ttl safe?

2014-03-11 Thread Michael Allman
limited by available disk space. I found a thread on the usage of spark.cleaner.ttl on the old Spark Users Google group here: https://groups.google.com/forum/#!topic/spark-users/9ebKcNCDih4 I think this setting may be what I'm looking for, however the cleaner seems to delete data that'