I'd suggest using the new consumer instead of the old consumer. We've refined the implementation such that even with auto-commit you should get at least once processing in the worst case (and when there aren't failures, exactly once). The 0.10.0.0 release should get all of these semantics right.
-Ewen On Mon, Jul 11, 2016 at 7:05 AM, Gerard Klijs <gerard.kl...@dizzit.com> wrote: > You could set the auto.commit.interval.ms to a lower value, in your > example > it is 10 seconds, which can be a lot of messages. I don't really see how it > could be prevented any further, since offset's can only committed by > consumer to the partitions they are assigned to. I do believe there is some > work in progress in which the assigned of partitions to consumers is > somewhat sticky. > In that case when a consumer has been assigned the same partitions after > the rebalance as it has had before, and then it should not be necessary to > consume the same data again in those partitions. > > On Mon, Jul 11, 2016 at 3:18 PM Michael Luban <mluban....@gmail.com> > wrote: > > > Using the 0.8.2.1 client. > > > > Is it possible to statistically minimize the possibility of duplication > in > > this scenario or has this behavior been corrected in a later client > > version? Or is the test flawed? > > > > https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0 > > > -- Thanks, Ewen