Re: post-checkpoint watermark out of sync with event stream?

Aljoscha Krettek Wed, 15 Apr 2020 08:13:28 -0700

Hi Cliff,

On 14.04.20 19:29, Cliff Resnick wrote

I'm wondering how this could be possible. The only explanation I can think
of is:


4. on "endTime" timer key state is purged.
5 --- job fail ---
6.  job restarted on 2.5 hour old Savepoint
7.  watermark regresses (?) from "endTime" watermark.
8. a long tail event squeaks through under temporarily backdated watermark
9. data store data for key is replaced with long tail data,

Is the above possible, or perhaps there is another possible scenario? Any
opinions appreciated!

Yes, I'm quite sure this is possible. The reason is that watermarks are*not* checkpointed at operators but are purely driven by data that comesthrough. When a pipeline is (re)started all operators will have -Inf asthe current watermark, even when starting from a checkpoint/savepoint.

We didn't (so far) want to checkpoint the watermark because this wouldintroduce additional complexity in handling that state. All operatorswould be stateful, and it's not trivial (though certainly possible) todetermine the watermark of an operator in a scale in/out scenario.


Is this a big problem for you?

Btw, there's this Jira issue about a symptom of this:https://issues.apache.org/jira/browse/FLINK-5601. You can see that thediscussion fizzled out from my side, mostly because of the mentionedcomplexities. But it might be that we need to try and find a solutionfor this.


Best,
Aljoscha

Re: post-checkpoint watermark out of sync with event stream?

Reply via email to