Hello folks So AFAIK data loss on exactly once will happen if
- start a transaction on kafka. - pre commit done ( kafka is prepared for the commit ) - commit fails ( kafka went own or n/w issue or what ever ). kafka has an uncommitted transaction - pipe was down for say n minutes and the kafka based transaction time out is m minutes, where m < n - the pipe restarts and tries to commit an aborted transaction and fails and thus data loss Thus it is imperative that the ransaction.max.timeout.ms out on kafka is a high value ( like n hours ) which should be greater then an SLA for downtime of the pipe. As in we have to ensure that the pipe is restarted before the transaction.timeout.ms set on the broker. The impulse is to make ransaction.max.timeout.ms high ( 24 hours ). The only implication is what happens if we start a brand new pipeline on the same topics which has yet to be resolved transactions, mostly b’coz of extended timeout of a previous pipe .. I would assume we are delayed then given that kafka will stall subsequent transactions from being visible to the consumer, b'coz of this one outstanding trsnasaction ? And if that is the case, then understandably we have to abort those dangling transactions before the 24 hrs time out. While there probably a way to do that, does flink help.. as in set a property that will abort a transaction on kafka, b'coz we need it to, given the above.. Again I might have totally misunderstood the whole mechanics and if yes apologies and will appreciate some clarifications. Thanks.