.
On Mon, Nov 30, 2015 at 8:46 AM Michal Čizmazia wrote:
> Does *spark.cleaner.ttl *still need to be used for Spark *1.4.1 *long-running
> streaming jobs? Or does *ContextCleaner* alone do all the cleaning?
>
Does *spark.cleaner.ttl *still need to be used for Spark *1.4.1 *long-running
streaming jobs? Or does *ContextCleaner* alone do all the cleaning?
: Tuesday, June 9, 2015 5:28 PM
To: Shao, Saisai; user
Subject: RE: [SparkStreaming 1.3.0] Broadcast failure after setting
"spark.cleaner.ttl"
Jerry, I agree with you.
However, in my case, I kept the monitoring the "blockmanager" folder. I do see
sometimes the number of file
deleted somehow.
-Original Message-
From: Shao, Saisai [mailto:saisai.s...@intel.com]
Sent: Tuesday, June 09, 2015 4:33 PM
To: Haopu Wang; user
Subject: RE: [SparkStreaming 1.3.0] Broadcast failure after setting
"spark.cleaner.ttl"
>From the stack I think this problem m
>From the stack I think this problem may be due to the deletion of broadcast
>variable, as you set the spark.cleaner.ttl, so after this timeout limit, the
>old broadcast variable will be deleted, you will meet this exception when you
>want to use it again after that time limit.
Hi,
Are you restarting your Spark streaming context through getOrCreate?
On 9 Jun 2015 09:30, "Haopu Wang" wrote:
> When I ran a spark streaming application longer, I noticed the local
> directory's size was kept increasing.
>
> I set "spark.cleaner.ttl&quo
When I ran a spark streaming application longer, I noticed the local
directory's size was kept increasing.
I set "spark.cleaner.ttl" to 1800 seconds in order clean the metadata.
The spark streaming batch duration is 10 seconds and checkpoint duration
is 10 minutes.
The setting
Hi,
I am using spark v 1.1.0. The default value of spark.cleaner.ttl is infinite
as per the online docs. Since a lot of shuffle files are generated in
/tmp/spark-local* and the disk is running out of space, we tested with a
smaller value of ttl. However, even when job has completed and the timer
a heap of 30G, roughly ~16G is taken by
>>> "[B" which is byte arrays.
>>>
>>> Still investigating more and would appreciate pointers for
>>> troubleshooting. I have dumped the heap of a receiver and will try to go
>>> over it.
>>>
>
omehow missed that parameter when I was reviewing the documentation,
>>> that should do the trick! Thank you!
>>>
>>> 2014-09-10 2:10 GMT+01:00 Shao, Saisai :
>>>
>>>> Hi Luis,
>>>>
>>>>
>>>>
>>>> The parameter “s
,
>> that should do the trick! Thank you!
>>
>> 2014-09-10 2:10 GMT+01:00 Shao, Saisai :
>>
>> Hi Luis,
>>>
>>>
>>>
>>> The parameter “spark.cleaner.ttl” and “spark.streaming.unpersist” can be
>>> used to remove useless timeo
0, 2014 at 1:43 AM, Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com> wrote:
> I somehow missed that parameter when I was reviewing the documentation,
> that should do the trick! Thank you!
>
> 2014-09-10 2:10 GMT+01:00 Shao, Saisai :
>
> Hi Luis,
>>
>>
>>
I somehow missed that parameter when I was reviewing the documentation,
that should do the trick! Thank you!
2014-09-10 2:10 GMT+01:00 Shao, Saisai :
> Hi Luis,
>
>
>
> The parameter “spark.cleaner.ttl” and “spark.streaming.unpersist” can be
> used to remove useless timeout s
Hi Luis,
The parameter “spark.cleaner.ttl” and “spark.streaming.unpersist” can be used
to remove useless timeout streaming data, the difference is that
“spark.cleaner.ttl” is time-based cleaner, it does not only clean streaming
input data, but also Spark’s useless metadata; while
ed to use spark.cleaner.ttl and
spark.streaming.unpersist together to mitigate that problem. And I also
wonder if new RDD are being batched while a RDD is being processed.
Regards,
Luis
be precisely aligned with latest code, so the best
> way is to check the code.
>
> -Original Message-
> From: Haopu Wang [mailto:hw...@qilinsoft.com]
> Sent: Wednesday, July 23, 2014 5:56 PM
> To: user@spark.apache.org
> Subject: RE: "spark.streaming.unpersist"
" and "spark.cleaner.ttl"
Jerry, thanks for the response.
For the default storage level of DStream, it looks like Spark's document is
wrong. In this link:
http://spark.apache.org/docs/latest/streaming-programming-guide.html#memory-tuning
It mentions:
"Default persistence level
."
I will take a look at DStream.scala although I have no Scala experience.
-Original Message-
From: Shao, Saisai [mailto:saisai.s...@intel.com]
Sent: 2014年7月23日 15:13
To: user@spark.apache.org
Subject: RE: "spark.streaming.unpersist" and "spark.cleaner.ttl"
Hi Haop
Hi Haopu,
Please see the inline comments.
Thanks
Jerry
-Original Message-
From: Haopu Wang [mailto:hw...@qilinsoft.com]
Sent: Wednesday, July 23, 2014 3:00 PM
To: user@spark.apache.org
Subject: "spark.streaming.unpersist" and "spark.cleaner.ttl"
I have a DStream r
I have a DStream receiving data from a socket. I'm using local mode.
I set "spark.streaming.unpersist" to "false" and leave "
spark.cleaner.ttl" to be infinite.
I can see files for input and shuffle blocks under "spark.local.dir"
folder and the si
Yes, we are also facing same problem. The workaround we came up with is
- store the broadcast variable id when it was first created
- then create a scheduled job which runs every (spark.cleaner.ttl -
1minute) interval and creates the same broadcast variable using same id.
This way spark is
And to answer your original question, spark.cleaner.ttl is not safe for the
exact reason you brought up. The PR Mark linked intends to provide a much
cleaner (and safer) solution.
On Tue, Mar 11, 2014 at 2:01 PM, Mark Hamstra wrote:
> Actually, TD's work-in-progress is probably more
stem with limited disk space. I believe there's enough
> space if spark would clean up unused data from previous iterations, but as
> it stands the number of iterations I can run is limited by available disk
> space.
>
> I found a thread on the usage of spark.cleaner.ttl on the old
limited by
available disk space.
I found a thread on the usage of spark.cleaner.ttl on the old Spark Users
Google group here:
https://groups.google.com/forum/#!topic/spark-users/9ebKcNCDih4
I think this setting may be what I'm looking for, however the cleaner
seems to delete data that'
24 matches
Mail list logo