Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-28 Thread Piotr Nowojski
Thank you all for investigation/reporting/discussion. I have merged an older PR [1] that was fixing this issue which was previously rejected as we didn’t realise this is a production issue. I have merged it and issue should be fixed in Flink 1.10, 1.9.2 and 1.8.3 releases. Piotrek [1] https:/

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Tony Wei
Hi Piotrek, There was already an issue [1] and PR for this thread. Should we mark it as duplicated or related issue? Best, Tony Wei [1] https://issues.apache.org/jira/browse/FLINK-10377 Piotr Nowojski 於 2019年11月28日 週四 上午12:17寫道: > Hi Tony, > > Thanks for the explanation. Assuming that’s what’

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
Hi Tony, Thanks for the explanation. Assuming that’s what’s happening, then I agree, this checkStyle should be removed. I created a ticket for this issue https://issues.apache.org/jira/browse/FLINK-14979 Piotrek > On 27 Nov 2019, at 16:28, T

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Tony Wei
Hi Piotrek, The case here was that the first snapshot is a savepoint. I know that if the following checkpoint succeeded before the previous one, the previous one will be subsumed by JobManager. However, if that previous one is a savepoint, it won't be subsumed. That leads to the case that Chesney

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
Hi, Maybe Chesney you are right, but I’m not sure. TwoPhaseCommitSink was based on Pravega’s sink for Flink, which was implemented by Stephan, and it has the same logic [1]. If I remember the discussions with Stephan/Till, the way how Flink is using Akka probably guarantees that messages will b

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Chesnay Schepler
This looks to me like the TwoPhaseCommitSinkFunction is a bit too strict. The notification for complete checkpoints is not reliable; it may be late, not come at all, possibly even in different order than expected. As such, if you a simple case of snapshot -> snapshot -> notify -> notify the s

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Tony Wei
Hi, As the follow up, it seem that savepoint can't be subsumed, so that its notification could still be send to each TMs. Is this a bug that need to be fixed in TwoPhaseCommitSinkFunction? Best, Tony Wei Tony Wei 於 2019年11月27日 週三 下午3:43寫道: > Hi, > > I want to raise this question again, since I

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-26 Thread Tony Wei
Hi, I want to raise this question again, since I have had this exception on my production job. The exception is as follows > 2019-11-27 14:47:29 java.lang.RuntimeException: Error while confirming checkpoint > at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205) > at jav

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-10-08 Thread Stefan Richter
Hi Pedro, unfortunately the interesting parts are all removed from the log, we already know about the exception itself. In particular, what I would like to see is what checkpoints have been triggered and completed before the exception happens. Best, Stefan > Am 08.10.2018 um 10:23 schrieb Pedr

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-10-08 Thread PedroMrChaves
Hello, Find attached the jobmanager.log. I've omitted the log lines from other runs, only left the job manager info and the run with the error. jobmanager.log Thanks again for your help. Regards

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-25 Thread Stefan Richter
Hi, I cannot spot anything bad or „wrong“ about your job configuration. Maybe you can try to save and send the logs if it happens again? Did you observe this only once, often, or is it something that is even reproduceable? Best, Stefan > Am 24.09.2018 um 10:15 schrieb PedroMrChaves : > > Hell

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-24 Thread PedroMrChaves
Hello Stefan, Thank you for the help. I've actually lost those logs to due several cluster restarts that we did, which cause log rotation up (limit = 5 versions). Those log lines that i've posted were the only ones that showed signs of some problem. *The configuration of the job is as follows:

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-21 Thread Stefan Richter
Hi, could you provide some logs for this problematic job because I would like to double check the reason why this violated precondition did actually happen? Thanks, Stefan > Am 20.09.2018 um 17:24 schrieb Stefan Richter : > > FYI, here a link to my PR: https://github.com/apache/flink/pull/6723

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-20 Thread Stefan Richter
FYI, here a link to my PR: https://github.com/apache/flink/pull/6723 > Am 20.09.2018 um 14:52 schrieb Stefan Richter : > > Hi, > > I think the failing precondition is too strict because sometimes a checkpoint > can overtake another checkpoint and in that case the commit is already > subsumed.

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-20 Thread Stefan Richter
Hi, I think the failing precondition is too strict because sometimes a checkpoint can overtake another checkpoint and in that case the commit is already subsumed. I will open a Jira and PR with a fix. Best, Stefan > Am 19.09.2018 um 10:04 schrieb PedroMrChaves : > > Hello, > > I have a runni