Thank you for the clarification. That is what I was thinking. Regards.
On Sat, Apr 17, 2021 at 3:24 AM Arvid Heise <ar...@apache.org> wrote: > Hi Vishal, > > no afaik Flink only cancels on restore from a checkpoint or savepoint > (manual or automatic). That's what I meant with > >> they may linger for a longer time if you stop an application entirely >> (for example for an upgrade). >> > > Your upgraded application will hopefully be up before Kafka transactions > time out. > > However, my statement was a bit misleading: If you gracefully stop the > Flink application (stop with drain), then Flink should also close all > transactions. You'd only get lingering transactions when you cancel the > Flink application and start from scratch (e.g., error in business logic). > But then you usually also recreate the topic which hopefully also removes > all pending transactions. > > On Fri, Apr 16, 2021 at 10:28 PM Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> Thanks for the feedback and. glad I am on the right track. >> >> > Outstanding transactions should be automatically aborted on restart by >> Flink. >> >> Let me understand this >> >> 1. Flink pipe is cancelled and has dangling kafka transactions. >> 2. A new Flink pipe ( not restored from a checkpoint or sp ) is started >> which is essentially the same pipe as 1 but does not restore. Would >> the dangling kafka transactions be aborted ? >> >> If yes, how does it work? As in how does the new pipe. know which >> transactions to abort ? Does it ask kafka for pending transactions and know >> which one belongs to the first pipe ( maybe b'coz they share some id b'coz >> of name of the pipe or something else ) ? >> >> Thanks again, >> >> Vishal >> >> >> >> On Fri, Apr 16, 2021 at 1:37 PM Arvid Heise <ar...@apache.org> wrote: >> >>> Hi Vishal, >>> >>> I think you pretty much nailed it. >>> >>> Outstanding transactions should be automatically aborted on restart by >>> Flink. Flink (re)uses a pool of transaction ids, such that all possible >>> transactions by Flink are canceled on restart. >>> >>> I guess the biggest downside of using a large transaction timeout is >>> that other clients might leak transactions for a longer period of time or >>> that they may linger for a longer time if you stop an application entirely >>> (for example for an upgrade). >>> >>> On Fri, Apr 16, 2021 at 4:08 PM Vishal Santoshi < >>> vishal.santo...@gmail.com> wrote: >>> >>>> Hello folks >>>> >>>> So AFAIK data loss on exactly once will happen if >>>> >>>> - >>>> >>>> start a transaction on kafka. >>>> - >>>> >>>> pre commit done ( kafka is prepared for the commit ) >>>> - >>>> >>>> commit fails ( kafka went own or n/w issue or what ever ). kafka >>>> has an uncommitted transaction >>>> - >>>> >>>> pipe was down for say n minutes and the kafka based transaction >>>> time out is m minutes, where m < n >>>> - >>>> >>>> the pipe restarts and tries to commit an aborted transaction and >>>> fails and thus data loss >>>> >>>> Thus it is imperative that the ransaction.max.timeout.ms out on kafka >>>> is a high value ( like n hours ) which should be greater then an SLA for >>>> downtime of the pipe. As in we have to ensure that the pipe is restarted >>>> before the transaction.timeout.ms set on the broker. >>>> >>>> The impulse is to make ransaction.max.timeout.ms high ( 24 hours ). The >>>> only implication is what happens if we start a brand new pipeline on the >>>> same topics which has yet to be resolved transactions, mostly b’coz of >>>> extended timeout of a previous pipe .. I would assume we are delayed then >>>> given that kafka will stall subsequent transactions from being visible to >>>> the consumer, b'coz of this one outstanding trsnasaction ? >>>> >>>> And if that is the case, then understandably we have to abort those >>>> dangling transactions before the 24 hrs time out. While there probably a >>>> way to do that, does flink help.. as in set a property that will abort a >>>> transaction on kafka, b'coz we need it to, given the above.. >>>> >>>> Again I might have totally misunderstood the whole mechanics and if yes >>>> apologies and will appreciate some clarifications. >>>> >>>> >>>> Thanks. >>>> >>>> >>>> >>>> >>>>