The consumers are using default settings, which means that enable.auto.commit=true and auto.commit.interval.ms=5000. I'm not committing manually; just consuming messages.
On Thu, Nov 30, 2017 at 1:09 AM, Frank Lyaruu <flya...@gmail.com> wrote: > Do you commit the received messages? Either by doing it manually or setting > enable.auto.commit and auto.commit.interval.ms? > > On Wed, Nov 29, 2017 at 11:15 PM, Tom van den Berge < > tom.vandenbe...@gmail.com> wrote: > > > I'm using Kafka 0.10.0. > > > > I'm reading messages from a single topic (20 partitions), using 4 > consumers > > (one group), using a standard java consumer with default configuration, > > except for the key and value deserializer, and a group id; no other > > settings. > > > > We've been experiencing a serious problem a few times now, after a large > > burst of messages (75000) have been posted to the topic. The consumer lag > > (as reported by Kafka's kafka-consumer-groups.sh) immediately shows a > huge > > lag, which is expected. The consumers start processing the messages, > which > > is expected to take them at least 30 minutes. In the mean time, more > > messages are posted to the topic, but at a "normal" rate, which the > > consumers normally handle easily. The problem is that the reported > consumer > > lag is not decreasing at all. After some 30 minutes, it has even > increased > > slightly. This would mean that the consumers are not able to process the > > backlog at all, which is extremely unlikely. > > > > After a restart of all consumer applications, something really surprising > > happens: the lag immediately drops to nearly 0! It is technically > > impossible that the consumers really processed all messages in a matter > of > > seconds. Manual verification showed that many messages were not processed > > at all; they seem to have disappeared somehow. So it seems that > restarting > > the consumers somehow messed up the offset (I think). > > > > On top of that, I noticed that the reported lag shows seemingly > impossible > > figures. During the time that the lag was not decreasing, before the > > restart of the consumers, the "current offset" that was reported for some > > partitions decreased. To my knowledge, that is impossible. > > > > Does anyone have an idea on how this could have happened? > > >