Re: Java high level consumer providing duplicate messages when auto commit is off

Cliff Rhyne Wed, 21 Oct 2015 10:14:19 -0700

Hi James,

There are two scenarios we run:


1. Multiple partitions with one consumer per partition.  This rarely has
starting/stopping of consumers, so the pool is very static.  There is a
configured consumer timeout, which is causing the ConsumerTimeoutException
to get thrown prior to the test starting.  We handle this exception and
then resume consuming.
2. Single partition with one consumer.  This consumer is started by a
triggered condition (number of messages pending to be processed in the
kafka topic or a schedule).  The consumer is stopped after processing is
completed.

In both cases, based on my understanding there shouldn't be a rebalance as
either a) all consumers are running or b) there's only one consumer /
partition.  Also, the same consumer group is used by all consumers in
scenario 1 and 2.  Is there a good way to investigate whether rebalances
are occurring?

Thanks,
Cliff

On Wed, Oct 21, 2015 at 11:37 AM, James Cheng <jch...@tivo.com> wrote:

> Do you have multiple consumers in a consumer group?
>
> I think that when a new consumer joins the consumer group, that the
> existing consumers will stop consuming during the group rebalance, and then
> when they start consuming again, that they will consume from the last
> committed offset.
>
> You should get more verification on this, tho. I might be remembering
> wrong.
>
> -James
>
> > On Oct 21, 2015, at 8:40 AM, Cliff Rhyne <crh...@signal.co> wrote:
> >
> > Hi,
> >
> > My team and I are looking into a problem where the Java high level
> consumer
> > provides duplicate messages if we turn auto commit off (using version
> > 0.8.2.1 of the server and Java client).  The expected sequence of events
> > are:
> >
> > 1. Start high-level consumer and initialize a KafkaStream to get a
> > ConsumerIterator
> > 2. Consume n items (could be 10,000, could be 1,000,000) from the
> iterator
> > 3. Commit the new offsets
> >
> > What we are seeing is that during step 2, some number of the n messages
> are
> > getting returned by the iterator in duplicate (in some cases, we've seen
> > n*5 messages consumed).  The problem appears to go away if we turn on
> auto
> > commit (and committing offsets to kafka helped too), but auto commit
> causes
> > conflicts with our offset rollback logic.  The issue seems to happen more
> > when we are in our test environment on a lower-cost cloud provider.
> >
> > Diving into the Java and Scala classes including the ConsumerIterator,
> it's
> > not obvious what event causes a duplicate offset to be requested or
> > returned (there's even a loop that is supposed to exclude duplicate
> > messages in this class).  I tried turning on trace logging but my log4j
> > config isn't getting the Kafka client logs to write out.
> >
> > Does anyone have suggestions of where to look or how to enable logging?
> >
> > Thanks,
> > Cliff
>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

Re: Java high level consumer providing duplicate messages when auto commit is off

Reply via email to