Re: Lost messages and messed up offsets

Tom van den Berge Thu, 30 Nov 2017 01:03:46 -0800

The consumers are using default settings, which means that
enable.auto.commit=true and auto.commit.interval.ms=5000. I'm not
committing manually; just consuming messages.


On Thu, Nov 30, 2017 at 1:09 AM, Frank Lyaruu <flya...@gmail.com> wrote:

> Do you commit the received messages? Either by doing it manually or setting
> enable.auto.commit and auto.commit.interval.ms?
>
> On Wed, Nov 29, 2017 at 11:15 PM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
> > I'm using Kafka 0.10.0.
> >
> > I'm reading messages from a single topic (20 partitions), using 4
> consumers
> > (one group), using a standard java consumer with default configuration,
> > except for the key and value deserializer, and a group id; no other
> > settings.
> >
> > We've been experiencing a serious problem a few times now, after a large
> > burst of messages (75000) have been posted to the topic. The consumer lag
> > (as reported by Kafka's kafka-consumer-groups.sh) immediately shows a
> huge
> > lag, which is expected. The consumers start processing the messages,
> which
> > is expected to take them at least 30 minutes. In the mean time, more
> > messages are posted to the topic, but at a "normal" rate, which the
> > consumers normally handle easily. The problem is that the reported
> consumer
> > lag is not decreasing at all. After some 30 minutes, it has even
> increased
> > slightly. This would mean that the consumers are not able to process the
> > backlog at all, which is extremely unlikely.
> >
> > After a restart of all consumer applications, something really surprising
> > happens: the lag immediately drops to nearly 0! It is technically
> > impossible that the consumers really processed all messages in a matter
> of
> > seconds. Manual verification showed that many messages were not processed
> > at all; they seem to have disappeared somehow. So it seems that
> restarting
> > the consumers somehow messed up the offset (I think).
> >
> > On top of that, I noticed that the reported lag shows seemingly
> impossible
> > figures. During the time that the lag was not decreasing, before the
> > restart of the consumers, the "current offset" that was reported for some
> > partitions decreased. To my knowledge, that is impossible.
> >
> > Does anyone have an idea on how this could have happened?
> >
>

Re: Lost messages and messed up offsets

Reply via email to