Thanks. I didn't know we could set that. On Fri, May 13, 2016 at 12:44 PM, Stephan Ewen <se...@apache.org> wrote:
> You can use the checkpoint mode to "at least once". > That way, barriers never block. > > On Fri, May 13, 2016 at 6:05 PM, Srikanth <srikanth...@gmail.com> wrote: > >> I have a follow up. Is there a recommendation of list of knobs that can >> be tuned if at least once guarantee while handling failure is good enough? >> For cases like alert generation, non idempotent sink, etc where the >> system can live with duplicates or has other mechanism to handle them. >> >> Srikanth >> >> On Fri, May 13, 2016 at 8:09 AM, Stephan Ewen <se...@apache.org> wrote: >> >>> Hi Srikanth! >>> >>> That is an interesting idea. >>> I have it on my mind to create a design doc for checkpointing >>> improvements. That could be added as a proposal there. >>> >>> I hope I'll be able to start with that design doc next week. >>> >>> Greetings, >>> Stephan >>> >>> >>> On Fri, May 13, 2016 at 1:35 PM, Matthias J. Sax <mj...@apache.org> >>> wrote: >>> >>>> I don't think barries can "expire" as of now. Might be a nice idea >>>> thought -- I don't know if this might be a problem in production. >>>> >>>> Furthermore, I want to point out, that an "expiring checkpoint" would >>>> not break exactly-once processing, as the latest successful checkpoint >>>> can always be used to recover correctly. Only the recovery-time would be >>>> increase. because if a "barrier expires" and no checkpoint can be >>>> stored, more data has to be replayed using the "old" checkpoint". >>>> >>>> >>>> -Matthias >>>> >>>> On 05/12/2016 09:21 PM, Srikanth wrote: >>>> > Hello, >>>> > >>>> > I was reading about Flink's checkpoint and wanted to check if I >>>> > correctly understood the usage of barriers for exactly once >>>> processing. >>>> > 1) Operator does alignment by buffering records coming after a >>>> barrier >>>> > until it receives barrier from all upstream operators instances. >>>> > 2) Barrier is always preceded by a watermark to trigger processing >>>> all >>>> > windows that are complete. >>>> > 3) Records in windows that are not triggered are also saved as part >>>> of >>>> > checkpoint. These windows are repopulated when restoring from >>>> checkpoints. >>>> > >>>> > In production setups, were there any cases where alignment during >>>> > checkpointing caused unacceptable latency? >>>> > If so, is there a way to indicate say wait for a MAX 100 ms? That way >>>> we >>>> > have exactly-once in most situations but prefer at least once over >>>> > higher latency in corner cases. >>>> > >>>> > Srikanth >>>> >>>> >>> >> >