sdbCompactFilter ttl
> config, the checkpoint size never grows over 10GB.
> However, two days after upgrade, checkpointing started to fail because of the
> "Checkpoint expired before completing".
>
> From the log, I could not get anything useful.
> But in the Flink UI, the last
state size. To be more specific, our checkpoint size grows
into 200GB in 2 weeks.
After upgrade to 1.8.0 and utilize the cleanupInRocksdbCompactFilter ttl
config, the checkpoint size never grows over 10GB.
However, two days after upgrade, checkpointing started to fail because of
the "*Checkpoi
--
>>> *From:* Gagan Agrawal
>>> *Sent:* Thursday, November 1, 2018 13:38
>>> *To:* myas...@live.com
>>> *Cc:* happydexu...@gmail.com; user@flink.apache.org
>>> *Subject:* Re: Savepoint failed with error "Checkpoint
;> posted in dev-mail-list.
>>
>> Best
>> Yun Tang
>> --
>> *From:* Gagan Agrawal
>> *Sent:* Thursday, November 1, 2018 13:38
>> *To:* myas...@live.com
>> *Cc:* happydexu...@gmail.com; user@flink.apache.org
>> *Su
pydexu...@gmail.com; user@flink.apache.org
> *Subject:* Re: Savepoint failed with error "Checkpoint expired before
> completing"
>
> Thanks Yun for your inputs. Yes, increasing checkpoint helps and we are
> able to save save points now. In our case we wanted to increase pa
l-list.
Best
Yun Tang
From: Gagan Agrawal
Sent: Thursday, November 1, 2018 13:38
To: myas...@live.com
Cc: happydexu...@gmail.com; user@flink.apache.org
Subject: Re: Savepoint failed with error "Checkpoint expired before completing"
Thanks Yun for
> *From:* Gagan Agrawal
> *Sent:* Wednesday, October 31, 2018 19:03
> *To:* happydexu...@gmail.com
> *Cc:* user@flink.apache.org
> *Subject:* Re: Savepoint failed with error "Checkpoint expired before
> completing"
>
> Hi Henry,
>
he.org
Subject: Re: Savepoint failed with error "Checkpoint expired before completing"
Hi Henry,
Thanks for your response. However we don't face this issue during normal run as
we have incremental checkpoints. Only when we try to take savepoint (which
tries to save entire state in
ntext.runSecured(HadoopSecurityContext.java:41)
> > at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
> > Caused by: java.util.concurrent.CompletionException:
> java.util.concurrent.CompletionException: java.lang.Exception: Checkpoin
d.java:1120)
> Caused by: java.util.concurrent.CompletionException:
> java.util.concurrent.CompletionException: java.lang.Exception: Checkpoint
> expired before completing
> at
> org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:955)
>
:
java.util.concurrent.CompletionException: java.lang.Exception: Checkpoint
expired before completing
at
org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:955)
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at
Although there may be no checkpoints in flight with this configuration,
there are most certainly records floating around in various buffers
which filled up during your sink pausing everything. Those records need
to be processed first before the new chackpoint's checkpoint barrier may
make it throug
One more question. Since I have set the "Maximum Concurrent Checkpoints" to
1. Will cascading effect still be true?
Whenever my sink operator returns to normal (in terms of latency), new
checkpoint after this point should work, right? there are no other
in-flight/concurrent checkpoints still in pr
Stephan, thanks a lot for the explanation. Now everything makes sense to
me. Will set the min pause.
On Sat, Dec 2, 2017 at 8:58 AM, Stephan Ewen wrote:
> Hi Steven!
>
> You are right, there could be some cascading effect from previous
> checkpoints.
> I think the best way to handle that is to s
Hi Steven!
You are right, there could be some cascading effect from previous
checkpoints.
I think the best way to handle that is to set the "minimum pause between
checkpoints". In fact, I would actually recommend this over the checkpoint
interval parameter.
The pause will allow the job to handle
Here is the checkpoint config. no concurrent checkpoints with 2 minute
checkpoint interval and timeout.
Problem is gone after redeployment. I will try if I can reproduce the issue
[image: Inline image 1]
On Fri, Dec 1, 2017 at 6:17 AM, Nico Kruber wrote:
> Hi Steven,
> by default, checkpoints
Hi Steven,
by default, checkpoints time out after 10 minutes if you haven't used
CheckpointConfig#setCheckpointTimeout() to change this timeout.
Depending on your checkpoint interval, and your number of concurrent
checkpoints, there may already be some other checkpoint processes
running while you
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 9353
expired before completing
I might know why this happened in the first place. Our sink operator does
synchronous HTTP post, which had a 15-mint latency spike when this all
started. This could block flink threads and prevent
18 matches
Mail list logo