Re: Java high level consumer providing duplicate messages when auto commit is off

Jiangjie Qin Thu, 22 Oct 2015 21:48:51 -0700

Hi Cliff,

If auto.offset.commit is set to true, the offset will be committed in
following cases in addition to periodical offset commit:


1. During consumer rebalance before release the partition ownership.
If consumer A owns partition P before rebalance, it will commit offset for
partition P during rebalance. If consumer B become the new owner of
partition P after rebalance, it will start from the committed offset, so
there will be no duplicate messages.
2. When consumer closes.

Rebalance will be triggered in the following cases:
1. A consumer joins/leaves the group.
2. Some topic/partition changes occurred to the interested topics.(e.g.
partition expansion for a topic; a new topic created and the consumer is
using a wildcard that matches the new topic name)

To answer your question:
Simple consumer should not interfere with high level consumer because it
does not have any group management embedded.

Typically a single high level consumer group will not rebalance unless
there is topic/partition change. However, it is possible the consumer
itself dropped out of the group and rejoins. This typically happens when
you have a ZK session timeout. In that case, you should see "ZK expired" in
your log. You can search for that and see if that is the problem.

Jiangjie (Becket) Qin


On Thu, Oct 22, 2015 at 1:14 PM, Cliff Rhyne <crh...@signal.co> wrote:

> We did some more testing with logging turned on (I figured out why it
> wasn't working).  We tried increasing the JVM memory capacity on our test
> server (it's lower than in production) and increasing the zookeeper
> timeouts.  Neither changed the results.  With trace logging enabled, we saw
> that we were getting rebalances even though there is only one high level
> consumer running (there previously was a simple consumer that was told to
> disconnect, but that consumer only checked the offsets and never consumed
> data).
>
> - Is there possibly a race condition where the simple consumer has a hold
> on a partition and shutdown is called before starting a high level consumer
> but shutdown is done asynchronously?
> - What are the various things that can cause a consumer rebalance other
> than adding / removing high level consumers?
>
> Thanks,
> Cliff
>
> On Wed, Oct 21, 2015 at 4:20 PM, Cliff Rhyne <crh...@signal.co> wrote:
>
> > Hi Kris,
> >
> > Thanks for the tip.  I'm going to investigate this further.  I checked
> and
> > we have fairly short zk timeouts and run with a smaller memory allocation
> > on the two environments we encounter this issue.  I'll let you all know
> > what I find.
> >
> > I saw this ticket https://issues.apache.org/jira/browse/KAFKA-2049 that
> > seems to be related to the problem (but would only inform that an issue
> > occurred).  Are there any other open issues that could be worked on to
> > improve Kafka's handling of this situation?
> >
> > Thanks,
> > Cliff
> >
> > On Wed, Oct 21, 2015 at 2:53 PM, Kris K <squareksc...@gmail.com> wrote:
> >
> >> Hi Cliff,
> >>
> >> One other case I observed in my environment is - when there were gc
> pauses
> >> on one of our high level consumer in the group.
> >>
> >> Thanks,
> >> Kris
> >>
> >> On Wed, Oct 21, 2015 at 10:12 AM, Cliff Rhyne <crh...@signal.co> wrote:
> >>
> >> > Hi James,
> >> >
> >> > There are two scenarios we run:
> >> >
> >> > 1. Multiple partitions with one consumer per partition.  This rarely
> has
> >> > starting/stopping of consumers, so the pool is very static.  There is
> a
> >> > configured consumer timeout, which is causing the
> >> ConsumerTimeoutException
> >> > to get thrown prior to the test starting.  We handle this exception
> and
> >> > then resume consuming.
> >> > 2. Single partition with one consumer.  This consumer is started by a
> >> > triggered condition (number of messages pending to be processed in the
> >> > kafka topic or a schedule).  The consumer is stopped after processing
> is
> >> > completed.
> >> >
> >> > In both cases, based on my understanding there shouldn't be a
> rebalance
> >> as
> >> > either a) all consumers are running or b) there's only one consumer /
> >> > partition.  Also, the same consumer group is used by all consumers in
> >> > scenario 1 and 2.  Is there a good way to investigate whether
> rebalances
> >> > are occurring?
> >> >
> >> > Thanks,
> >> > Cliff
> >> >
> >> > On Wed, Oct 21, 2015 at 11:37 AM, James Cheng <jch...@tivo.com>
> wrote:
> >> >
> >> > > Do you have multiple consumers in a consumer group?
> >> > >
> >> > > I think that when a new consumer joins the consumer group, that the
> >> > > existing consumers will stop consuming during the group rebalance,
> and
> >> > then
> >> > > when they start consuming again, that they will consume from the
> last
> >> > > committed offset.
> >> > >
> >> > > You should get more verification on this, tho. I might be
> remembering
> >> > > wrong.
> >> > >
> >> > > -James
> >> > >
> >> > > > On Oct 21, 2015, at 8:40 AM, Cliff Rhyne <crh...@signal.co>
> wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > My team and I are looking into a problem where the Java high level
> >> > > consumer
> >> > > > provides duplicate messages if we turn auto commit off (using
> >> version
> >> > > > 0.8.2.1 of the server and Java client).  The expected sequence of
> >> > events
> >> > > > are:
> >> > > >
> >> > > > 1. Start high-level consumer and initialize a KafkaStream to get a
> >> > > > ConsumerIterator
> >> > > > 2. Consume n items (could be 10,000, could be 1,000,000) from the
> >> > > iterator
> >> > > > 3. Commit the new offsets
> >> > > >
> >> > > > What we are seeing is that during step 2, some number of the n
> >> messages
> >> > > are
> >> > > > getting returned by the iterator in duplicate (in some cases,
> we've
> >> > seen
> >> > > > n*5 messages consumed).  The problem appears to go away if we turn
> >> on
> >> > > auto
> >> > > > commit (and committing offsets to kafka helped too), but auto
> commit
> >> > > causes
> >> > > > conflicts with our offset rollback logic.  The issue seems to
> happen
> >> > more
> >> > > > when we are in our test environment on a lower-cost cloud
> provider.
> >> > > >
> >> > > > Diving into the Java and Scala classes including the
> >> ConsumerIterator,
> >> > > it's
> >> > > > not obvious what event causes a duplicate offset to be requested
> or
> >> > > > returned (there's even a loop that is supposed to exclude
> duplicate
> >> > > > messages in this class).  I tried turning on trace logging but my
> >> log4j
> >> > > > config isn't getting the Kafka client logs to write out.
> >> > > >
> >> > > > Does anyone have suggestions of where to look or how to enable
> >> logging?
> >> > > >
> >> > > > Thanks,
> >> > > > Cliff
> >> > >
> >> > >
> >> > > ________________________________
> >> > >
> >> > > This email and any attachments may contain confidential and
> privileged
> >> > > material for the sole use of the intended recipient. Any review,
> >> copying,
> >> > > or distribution of this email (or any attachments) by others is
> >> > prohibited.
> >> > > If you are not the intended recipient, please contact the sender
> >> > > immediately and permanently delete this email and any attachments.
> No
> >> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> >> > > agreement on behalf of TiVo Inc. by email. Binding agreements with
> >> TiVo
> >> > > Inc. may only be made by a signed written agreement.
> >> > >
> >> >
> >>
> >
>

Re: Java high level consumer providing duplicate messages when auto commit is off

Reply via email to