Hi Srikanth!

That is an interesting idea.
I have it on my mind to create a design doc for checkpointing improvements.
That could be added as a proposal there.

I hope I'll be able to start with that design doc next week.

Greetings,
Stephan


On Fri, May 13, 2016 at 1:35 PM, Matthias J. Sax <mj...@apache.org> wrote:

> I don't think barries can "expire" as of now. Might be a nice idea
> thought -- I don't know if this might be a problem in production.
>
> Furthermore, I want to point out, that an "expiring checkpoint" would
> not break exactly-once processing, as the latest successful checkpoint
> can always be used to recover correctly. Only the recovery-time would be
> increase. because if a "barrier expires" and no checkpoint can be
> stored, more data has to be replayed using the "old" checkpoint".
>
>
> -Matthias
>
> On 05/12/2016 09:21 PM, Srikanth wrote:
> > Hello,
> >
> > I was reading about Flink's checkpoint and wanted to check if I
> > correctly understood the usage of barriers for exactly once processing.
> >  1) Operator does alignment by buffering records coming after a barrier
> > until it receives barrier from all upstream operators instances.
> >  2) Barrier is always preceded by a watermark to trigger processing all
> > windows that are complete.
> >  3) Records in windows that are not triggered are also saved as part of
> > checkpoint. These windows are repopulated when restoring from
> checkpoints.
> >
> > In production setups, were there any cases where alignment during
> > checkpointing caused unacceptable latency?
> > If so, is there a way to indicate say wait for a MAX 100 ms? That way we
> > have exactly-once in most situations but prefer at least once over
> > higher latency in corner cases.
> >
> > Srikanth
>
>

Reply via email to