Re: Checkpoint expired before completing with cleanupInRocksdbCompactFilter

2019-05-09 Thread Congxian Qiu
sdbCompactFilter ttl > config, the checkpoint size never grows over 10GB. > However, two days after upgrade, checkpointing started to fail because of the > "Checkpoint expired before completing". > > From the log, I could not get anything useful. > But in the Flink UI, the last

Checkpoint expired before completing with cleanupInRocksdbCompactFilter

2019-05-08 Thread Mu Kong
state size. To be more specific, our checkpoint size grows into 200GB in 2 weeks. After upgrade to 1.8.0 and utilize the cleanupInRocksdbCompactFilter ttl config, the checkpoint size never grows over 10GB. However, two days after upgrade, checkpointing started to fail because of the "*Checkpoi

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-11-19 Thread Gagan Agrawal
-- >>> *From:* Gagan Agrawal >>> *Sent:* Thursday, November 1, 2018 13:38 >>> *To:* myas...@live.com >>> *Cc:* happydexu...@gmail.com; user@flink.apache.org >>> *Subject:* Re: Savepoint failed with error "Checkpoint

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-11-04 Thread Steven Wu
;> posted in dev-mail-list. >> >> Best >> Yun Tang >> -- >> *From:* Gagan Agrawal >> *Sent:* Thursday, November 1, 2018 13:38 >> *To:* myas...@live.com >> *Cc:* happydexu...@gmail.com; user@flink.apache.org >> *Su

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-11-02 Thread Gagan Agrawal
pydexu...@gmail.com; user@flink.apache.org > *Subject:* Re: Savepoint failed with error "Checkpoint expired before > completing" > > Thanks Yun for your inputs. Yes, increasing checkpoint helps and we are > able to save save points now. In our case we wanted to increase pa

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-11-01 Thread Yun Tang
l-list. Best Yun Tang From: Gagan Agrawal Sent: Thursday, November 1, 2018 13:38 To: myas...@live.com Cc: happydexu...@gmail.com; user@flink.apache.org Subject: Re: Savepoint failed with error "Checkpoint expired before completing" Thanks Yun for

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Gagan Agrawal
> *From:* Gagan Agrawal > *Sent:* Wednesday, October 31, 2018 19:03 > *To:* happydexu...@gmail.com > *Cc:* user@flink.apache.org > *Subject:* Re: Savepoint failed with error "Checkpoint expired before > completing" > > Hi Henry, >

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Yun Tang
he.org Subject: Re: Savepoint failed with error "Checkpoint expired before completing" Hi Henry, Thanks for your response. However we don't face this issue during normal run as we have incremental checkpoints. Only when we try to take savepoint (which tries to save entire state in

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Gagan Agrawal
ntext.runSecured(HadoopSecurityContext.java:41) > > at > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120) > > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: java.lang.Exception: Checkpoin

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-30 Thread 徐涛
d.java:1120) > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: java.lang.Exception: Checkpoint > expired before completing > at > org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:955) >

Savepoint failed with error "Checkpoint expired before completing"

2018-10-30 Thread Gagan Agrawal
: java.util.concurrent.CompletionException: java.lang.Exception: Checkpoint expired before completing at org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:955) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) at

Re: Checkpoint expired before completing

2017-12-04 Thread Nico Kruber
Although there may be no checkpoints in flight with this configuration, there are most certainly records floating around in various buffers which filled up during your sink pausing everything. Those records need to be processed first before the new chackpoint's checkpoint barrier may make it throug

Re: Checkpoint expired before completing

2017-12-02 Thread Steven Wu
One more question. Since I have set the "Maximum Concurrent Checkpoints" to 1. Will cascading effect still be true? Whenever my sink operator returns to normal (in terms of latency), new checkpoint after this point should work, right? there are no other in-flight/concurrent checkpoints still in pr

Re: Checkpoint expired before completing

2017-12-02 Thread Steven Wu
Stephan, thanks a lot for the explanation. Now everything makes sense to me. Will set the min pause. On Sat, Dec 2, 2017 at 8:58 AM, Stephan Ewen wrote: > Hi Steven! > > You are right, there could be some cascading effect from previous > checkpoints. > I think the best way to handle that is to s

Re: Checkpoint expired before completing

2017-12-02 Thread Stephan Ewen
Hi Steven! You are right, there could be some cascading effect from previous checkpoints. I think the best way to handle that is to set the "minimum pause between checkpoints". In fact, I would actually recommend this over the checkpoint interval parameter. The pause will allow the job to handle

Re: Checkpoint expired before completing

2017-12-01 Thread Steven Wu
Here is the checkpoint config. no concurrent checkpoints with 2 minute checkpoint interval and timeout. Problem is gone after redeployment. I will try if I can reproduce the issue [image: Inline image 1] On Fri, Dec 1, 2017 at 6:17 AM, Nico Kruber wrote: > Hi Steven, > by default, checkpoints

Re: Checkpoint expired before completing

2017-12-01 Thread Nico Kruber
Hi Steven, by default, checkpoints time out after 10 minutes if you haven't used CheckpointConfig#setCheckpointTimeout() to change this timeout. Depending on your checkpoint interval, and your number of concurrent checkpoints, there may already be some other checkpoint processes running while you

Checkpoint expired before completing

2017-11-30 Thread Steven Wu
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 9353 expired before completing I might know why this happened in the first place. Our sink operator does synchronous HTTP post, which had a 15-mint latency spike when this all started. This could block flink threads and prevent