thanks Jay. after setting up a kafka instance and running some examples. I think I understand it better now. as the paper pointed out, Kafka made a design choice that made the problem simpler: essentially the "at least once " delivery actually means "at least once given sufficiently long time and sufficiently many tries", in fact as the paper says, the broker never cares to remember the offset of consumer, let alone whether the consumer succeeded in reading data, but it assumes that given the retention period, the client will indeed finally succeed in reading. From another perspective, the consumer has to take on the responsibility to make sure of "at least once consumption" (if this is required by application), by means such as atomic commit of the offset and data, as you mentioned.
Regards Yang On Wed, Aug 7, 2013 at 4:26 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > Yeah I'm not sure how good our understanding was when we wrote that. > > Here is my take now: > > At least once delivery is not that hard but you need the ability to > deduplicate things--basically you turn the "at least once delivery channel" > into the "exactly once channel" by throwing away duplicates. This means > 1. Some key assigned by the producer that allows the broker to detect a > re-published message to make publishing idempotent. This solves the problem > of producer retries. This key obviously has to be highly available--i.e. if > the leader for a partition fails the follower must correctly deduplicate > for all committed messages. > 2. Some key that allows the consumer to detect a re-consumed message. > > The first item is actually pretty doable as we can track some producer > sequence in the log and use it to avoid duplicate appends. We just need to > implement it. I think this can be done in a way that is fairly low overhead > and can be "on by default". > > We actually already provide such a key to the consumer--the offset. Making > use of this is actually somewhat application dependent. Obviously providing > exactly-once guarantees in the case of no failures is easy and we already > handle that case. The harder part is if a consumer process dies to ensure > that it restarts in a position that exactly matches the state changes that > it has made in some destination system. If the consumer application uses > the offset in a way that makes updates idempotent that will work, or if > they commit their offset and data atomically that works. However in general > the goal of a consumer is to produce some state change in another system (a > db, hdfs, some other data system, etc) and having a general solution that > works with all of these is hard since they have very different limitations > and features. > > -Jay > > > On Wed, Aug 7, 2013 at 4:00 PM, Yang <teddyyyy...@gmail.com> wrote: > > > I wonder why at-least-once guarantee is easier to maintain than > > exactly-once (in that the latter requires 2PC while the former does not , > > according to > > > > > http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf > > ) > > > > if u achieve at-least-once guarantee, you are able to assert between 2 > > cases "nothing" vs ">=1 delivered", which can be seen as 2 different > > answers 0 and 1. isn't this as hard as the common Byzantine general > > problem? > > > > Thanks > > Yang > > >