I don't think barries can "expire" as of now. Might be a nice idea thought -- I don't know if this might be a problem in production.
Furthermore, I want to point out, that an "expiring checkpoint" would not break exactly-once processing, as the latest successful checkpoint can always be used to recover correctly. Only the recovery-time would be increase. because if a "barrier expires" and no checkpoint can be stored, more data has to be replayed using the "old" checkpoint". -Matthias On 05/12/2016 09:21 PM, Srikanth wrote: > Hello, > > I was reading about Flink's checkpoint and wanted to check if I > correctly understood the usage of barriers for exactly once processing. > 1) Operator does alignment by buffering records coming after a barrier > until it receives barrier from all upstream operators instances. > 2) Barrier is always preceded by a watermark to trigger processing all > windows that are complete. > 3) Records in windows that are not triggered are also saved as part of > checkpoint. These windows are repopulated when restoring from checkpoints. > > In production setups, were there any cases where alignment during > checkpointing caused unacceptable latency? > If so, is there a way to indicate say wait for a MAX 100 ms? That way we > have exactly-once in most situations but prefer at least once over > higher latency in corner cases. > > Srikanth
signature.asc
Description: OpenPGP digital signature