Do you commit the received messages? Either by doing it manually or setting enable.auto.commit and auto.commit.interval.ms?
On Wed, Nov 29, 2017 at 11:15 PM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > I'm using Kafka 0.10.0. > > I'm reading messages from a single topic (20 partitions), using 4 consumers > (one group), using a standard java consumer with default configuration, > except for the key and value deserializer, and a group id; no other > settings. > > We've been experiencing a serious problem a few times now, after a large > burst of messages (75000) have been posted to the topic. The consumer lag > (as reported by Kafka's kafka-consumer-groups.sh) immediately shows a huge > lag, which is expected. The consumers start processing the messages, which > is expected to take them at least 30 minutes. In the mean time, more > messages are posted to the topic, but at a "normal" rate, which the > consumers normally handle easily. The problem is that the reported consumer > lag is not decreasing at all. After some 30 minutes, it has even increased > slightly. This would mean that the consumers are not able to process the > backlog at all, which is extremely unlikely. > > After a restart of all consumer applications, something really surprising > happens: the lag immediately drops to nearly 0! It is technically > impossible that the consumers really processed all messages in a matter of > seconds. Manual verification showed that many messages were not processed > at all; they seem to have disappeared somehow. So it seems that restarting > the consumers somehow messed up the offset (I think). > > On top of that, I noticed that the reported lag shows seemingly impossible > figures. During the time that the lag was not decreasing, before the > restart of the consumers, the "current offset" that was reported for some > partitions decreased. To my knowledge, that is impossible. > > Does anyone have an idea on how this could have happened? >