Hello folks

So AFAIK data loss on exactly once will happen if

   -

   start a transaction on kafka.
   -

   pre commit done ( kafka is prepared for the commit )
   -

   commit fails ( kafka went own or n/w issue or what ever ). kafka has an
   uncommitted transaction
   -

   pipe was down for say n minutes and the kafka based transaction time out
   is m minutes, where m < n
   -

   the pipe restarts and tries to commit an aborted transaction and fails
   and thus data loss

Thus it is imperative that the ransaction.max.timeout.ms out on kafka is a
high value ( like n hours ) which should be greater then an SLA for
downtime of the pipe. As in we have to ensure that the pipe is restarted
before the transaction.timeout.ms set on the broker.

The impulse is to make ransaction.max.timeout.ms high ( 24 hours ). The
only implication is what happens if we start a brand new pipeline on the
same topics which has yet to be resolved transactions, mostly b’coz of
extended timeout of a previous pipe .. I would assume we are delayed then
given that kafka will stall subsequent transactions from being visible to
the consumer, b'coz of this one outstanding trsnasaction ?

And if that is the case, then understandably we have to abort those
dangling transactions before the 24 hrs time out. While there probably a
way to do that, does flink help.. as in set a property that will abort a
transaction on kafka, b'coz we need it to, given the above..

Again I might have totally misunderstood the whole mechanics and if yes
apologies and will appreciate some clarifications.


Thanks.

Reply via email to