Guozhang, Exactly. This is the crux of the matter. Because it's async, the log is basically slightly out of date wrt to the run-time state and a failure of all replicas might take the data slightly back in time.
Given this, do you think that KIP-98 gives an all-or-nothing, no-matter-what guarantee for Kafka transactions? I think the key is whether the data which is asynchronously flushed is guaranteed to be recovered atomically in all cases. Asynchronous but atomic would be good. Andrew Schofield IBM Watson and Cloud Platform > > From: Guozhang Wang <wangg...@gmail.com> > Sent: 09 December 2016 22:59 > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional > Messaging > > Onur, > > I understand your question now. So it is indeed possible that after > commitTxn() returned the messages could still be lost permanently if all > replicas failed before the data was flushed to disk. This is the virtue of > Kafka's design to reply on replication (probably in memory) for high > availability, hence async flushing. This scenario already exist today and > KIP-98 did not intend to change this factor in any ways. > > Guozhang > > >