sorry to keep bugging the list, but I feel like I am either missing something important, or I'm finding something wrong w/ the standard consumer api, (or maybe just the docs need some clarification).
I started to think that I should probably just accept at least once semantics ... but I eventually realized that I'm not even sure we really get an at least once guarantee. I think it really might be zero-or-more. Or rather, messages will get pulled off the kafka queue at least once. but that doesn't mean your app will actually *process* those messages at least once -- there might be messages it never processes. Consider a really basic reader of a kafka queue: while(it.hasNext()){ val msg = it.next() doSomething(msg) } the question is, do I have any guarantees on how many times doSomething() is called on everything in the queue? I think the "guarantee" is: 1) most messages will get processed excatly once 2) around a restart, a chunk of msgs will get processed at least once, but probably more than once 3) around a restart, it is possible that one message will get processed ZERO times (1) & (2) are probably clear, so lemme explain how I think (3) could happen. Lets imagine messages a,b,c,... and two threads, one reading from the stream, and one thread that periodically commits the offsets. Imagine this sequence of events: ==Reader== -initializes w/ offset pointing to "a" -hasNext() ---> makeNext() will read "a" and update the local offset to "b" -msg = "a" -doSomething("a") -hasNext() ----> makeNext() will read "b" and update the local offset "c" ==Commiter== -commitOffsets stores the current offset as "c" =====PROCESS DIES===== ===== RESTARTS ===== ==Reader== -initializes w/ offset pointing to "c" -hasNext() --> makeNext() will read "c" and update local offset to "d" -msg = "c" -doSomething("c") ... note that in this scenario, doSomething("b") was never called. Probably for a lot of applications this doesn't matter. But seems like it this could be terrible for some apps. I can't think of any way of preventing it from user code. unless, maybe when the offsets get committed, it is always *before* the last thing read? eg., in my example, it would store the next offset as "b" or earlier? Is there a flaw in my logic? Do committed offsets always "undershoot" to prevent this? thanks, Imran