Re: No Heartbeat request on commit

Jason Gustafson Thu, 28 Apr 2016 12:30:34 -0700

Hey Vinay,

Thanks, that's really helpful. It does seem like there might be a problem
with the heartbeat trigger logic. I'll see if I can reproduce what you're
seeing locally. Might be helpful if you share a snippet of your poll loop.


Thanks,
Jason

On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <vinsharma.t...@gmail.com>
wrote:

> Hi Jason,
>
> i reverted back to KAFKA-3149. Producer still had issues related to schema
> but my consumer worked.
>
> Now consumer worked as expected. Although i did not encountered an error
> and generation was not marked dead by coordinator but i still see that
> successful heartbeat response are not logged as expected.
> My observation is following:-
> 1) Meta refresh also triggers heartbeat request. I say this because
> sometimes i see 2 heartbeat responses logged just a few milliseconds away
> where meta refresh and proactive commit happened almost simultaneously.
> 2) I still see that some commitSync requests do not have a heartbeat
> logged before or after commit. Although next proactive commit happened just
> in time and this time heartbeat request was successful hence saved session.
> In attached log you can see that poll was done at 14:17:41, a commit
> happened at 14:17:56 and another commit happened at 14:18:14. The only
> heart beat response logged during this time is at 14:18:14 which is 29
> seconds after poll where as a commit was performed 15 seconds after poll.
> Heartbeat interval was 3000.
> 3) There are long pauses in heartbeat responses in logs which should cause
> session to timeout but its not happening. This implies that commits trigger
> a heartbeat but they also act as heartbeat.
>
>
> Regards,
> Vinay
>
>
> On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> Ah, yeah. That's probably caused by the new topic metadata version, which
>> isn't supported on 0.9 brokers. To test on trunk, you'd have to upgrade
>> the
>> brokers as well. Either that or you can rewind to before KAFKA-3306 (which
>> was just committed the day before yesterday)?
>>
>> -Jason
>>
>> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <vinsharma.t...@gmail.com>
>> wrote:
>>
>> > Hi Jason,
>> >
>> > I build kafka-client and tried using it but my producers and consumers
>> > started throwing below exception. Is 0.10 not going to be compatible
>> with
>> > brokers on version 0.9.0.1? or do i need to make some config changes to
>> > producers / consumers to make them compatible with brokers on old
>> version?
>> > or do i need to upgrade brokers to new version as well?
>> >
>> >  org.apache.kafka.common.protocol.types.SchemaException: Error reading
>> > field 'brokers': Error reading field 'host': Error reading string of
>> length
>> > 17995, only 145 bytes available
>> > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
>> > at
>> >
>> >
>> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
>> > at
>> >
>> >
>> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
>> >
>> > Regards,
>> > Vinay Sharma
>> >
>> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <ja...@confluent.io>
>> > wrote:
>> >
>> > > Hey Vinay,
>> > >
>> > > Any chance you can run the same test against trunk? I'm guessing this
>> > might
>> > > be caused by a bug in the 0.9 consumer which basically causes some
>> > requests
>> > > to fail when a bunch of them are sent to the broker at the same time.
>> > >
>> > > -Jason
>> > >
>> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
>> vinsharma.t...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Jason,
>> > > >
>> > > > This makes sense.We use 0.9.0.1 and we do have session timeout set a
>> > bit
>> > > > high but nothing can guarantee that there will be no case when
>> > processing
>> > > > may not go higher than session timeout. I am trying to test a
>> proactive
>> > > > commit approach to handle such cases when processing takes unusually
>> > long
>> > > > time. To keep consumer's session alive during long processing time i
>> > > > proactively commitSync processed records every 15 seconds. Session
>> > > timeout
>> > > > i kept is 30000.
>> > > >
>> > > > *Problem:-*
>> > > > With heart beat interval is 3000 then i expect a hearbeat request
>> to be
>> > > > sent on each proactive commit which happens every 15 seconds. In my
>> > > tests i
>> > > > see that this does not happen always. I see a time window which is
>> > > greater
>> > > > than 30 seconds where no hearbeat is sent even thought there were
>> > commits
>> > > > in this duration. After this window i see a couple of successful
>> > > heartbeat
>> > > > responses till the end of poll but as soon as i poll again and call
>> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error. This error
>> > > always
>> > > > happen just after meta refresh or in next poll processing after a
>> meta
>> > > > refresh. I am attaching logs where i kept meta refresh interval
>> 40000,
>> > > > 90000, 500000.
>> > > >
>> > > > *Test results *:-
>> > > > Test with meta refresh 40000 ms ran around 70 seconds from 1st poll.
>> > > > Test with meta refresh 90000 ms ran around 120 seconds from 1st
>> poll.
>> > > > Test with meta refresh 500000 ms ran around 564 seconds from 1st
>> poll.
>> > > >
>> > > > Every test falls in line with above test cases where generation is
>> > marked
>> > > > dead some time after a meta refresh. Meta refresh before 1st poll
>> does
>> > > not
>> > > > create any issue but the ones after poll and during long processing
>> do.
>> > > >
>> > > > *Environment:-*
>> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
>> replication
>> > > > factor 3. Messages are already published to topic.
>> > > >
>> > > > *Logic used in test cases :- *
>> > > > On each poll I initialize a map with current committed offset
>> position
>> > of
>> > > > partitions being consumed. I update this map after each record
>> > processing
>> > > > and use this map to proactively commit every 15 seconds. Map is
>> > > initialized
>> > > > again after a proactive commit.
>> > > >
>> > > > I am not sure what is wrong here but i do not see any issue in code
>> or
>> > > > offset commits going on. Log files and a class with main method are
>> > > > attached for your reference.
>> > > >
>> > > > Regards,
>> > > > Vinay Sharma
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
>> ja...@confluent.io>
>> > > > wrote:
>> > > >
>> > > >> Hi Vinay,
>> > > >>
>> > > >> Answers below:
>> > > >>
>> > > >> 1)  Is it correct to say that each commitSync will trigger a
>> > > >> HeartBeatTask?
>> > > >> > If there is no hear beat sent in past since specified heartbeat
>> > > interval
>> > > >> > then i should see a successful heartbeat response or failure
>> message
>> > > in
>> > > >> > logs near to commitSync success log?
>> > > >>
>> > > >>
>> > > >> Not quite. Heartbeats are sent periodically according to the
>> > > >> heartbeat.interval.ms configuration. However, since the consumer
>> has
>> > no
>> > > >> background thread, they can only be sent in API calls such as
>> poll()
>> > or
>> > > >> commitSync(). So calling commitSync() may or may not result in a
>> > > heartbeat
>> > > >> depending only on whether one is "due."
>> > > >>
>> > > >> 2) is it correct to say that Meta Data refresh will not act as
>> > > heartbeat,
>> > > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
>> > > >>
>> > > >>
>> > > >> That is correct. Metadata refreshes are not related to heartbeats.
>> > > >>
>> > > >> 3) Where does a consumer session maintained? Lets say my consumer
>> is
>> > > >> > listening to 3 partitions on a 3 broker cluster where each
>> broker is
>> > > >> leader
>> > > >> > of 1 partition. So will each of the brokers will have a session
>> for
>> > my
>> > > >> > consumer or is it just 1 session maintained somewhere in common
>> like
>> > > >> > zookeeper?
>> > > >>
>> > > >>
>> > > >> One of the brokers serves as the "group coordinator." When the
>> > consumer
>> > > >> starts up, it sends a GroupCoordinator request to one of the
>> brokers
>> > to
>> > > >> find out who the coordinator is. Currently, coordinators are chosen
>> > from
>> > > >> among the leaders of the partitions of the __consumer_offsets
>> topic.
>> > > This
>> > > >> lets us take advantage of the leader election process to also
>> handle
>> > > >> coordinator failures. The coordinator of each group maintains state
>> > for
>> > > >> the
>> > > >> group and keeps track of session timeouts.
>> > > >>
>> > > >> 4) In above setup, during a long processing if I commit a record
>> > through
>> > > >> > commmitSync which triggers a hear beat request and a successful
>> > > >> response is
>> > > >> > received for the same then what does this response means? does it
>> > mean
>> > > >> that
>> > > >> > my session with each broker is renewed? or does it mean that just
>> > the
>> > > >> > leader for partition of committed record knows that my consumer
>> is
>> > > alive
>> > > >> > and consumer's session on other brokers will still timeout?
>> > > >>
>> > > >>
>> > > >> The coordinator is the only broker that is aware of a consumer's
>> > session
>> > > >> and all offset commits are sent to it. Successful heartbeats mean
>> that
>> > > the
>> > > >> session is still active. Heartbeats are also used to let the
>> consumer
>> > > >> discover when a rebalance has begun. If a new member joins the
>> group,
>> > > then
>> > > >> the coordinator returns an error code in the heartbeat responses of
>> > the
>> > > >> active members to let them know that they need to rejoin the group
>> so
>> > > that
>> > > >> partitions can be rebalanced.
>> > > >>
>> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The crux
>> of
>> > the
>> > > >> issue is that you need to call poll() often enough to avoid getting
>> > > timed
>> > > >> out by the coordinator. If you find this happening frequently, you
>> > > >> probably
>> > > >> need to increase session.timeout.ms. There's not really any
>> downside
>> > to
>> > > >> doing so other than that hard failures (in which the consumer
>> can't be
>> > > >> shutdown cleanly) will take a little longer to detect. Normal
>> shutdown
>> > > >> doesn't have this problem. It can be difficult in 0.9 to ensure
>> that
>> > > >> poll()
>> > > >> is called often enough since you don't have direct control over the
>> > > amount
>> > > >> of data returned in poll(), but we're adding an option
>> > > (max.poll.records)
>> > > >> in 0.10 which hopefully can be set conservatively enough to make
>> this
>> > > >> problem go away.
>> > > >>
>> > > >> -Jason
>> > > >>
>> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
>> > vinsharma.t...@gmail.com
>> > > >
>> > > >> wrote:
>> > > >>
>> > > >> > Hey,
>> > > >> >
>> > > >> > I am working on a simplified test case to check if there is any
>> > issue
>> > > >> in my
>> > > >> > code. Just to make sure that any of my assumptions are not
>> wrong, it
>> > > >> will
>> > > >> > be great if you can please help me in finding answers to
>> following
>> > > >> > queries:-
>> > > >> >
>> > > >> > 1)  Is it correct to say that each commitSync will trigger a
>> > > >> HeartBeatTask?
>> > > >> > If there is no hear beat sent in past since specified heartbeat
>> > > interval
>> > > >> > then i should see a successful heartbeat response or failure
>> message
>> > > in
>> > > >> > logs near to commitSync success log?
>> > > >> > 2) is it correct to say that Meta Data refresh will not act as
>> > > >> heartbeat,
>> > > >> > will not trigger heartBeatTask and will not reset heartBeatTask?
>> > > >> > 3) Where does a consumer session maintained? Lets say my
>> consumer is
>> > > >> > listening to 3 partitions on a 3 broker cluster where each
>> broker is
>> > > >> leader
>> > > >> > of 1 partition. So will each of the brokers will have a session
>> for
>> > my
>> > > >> > consumer or is it just 1 session maintained somewhere in common
>> like
>> > > >> > zookeeper?
>> > > >> > 4) In above setup, during a long processing if I commit a record
>> > > through
>> > > >> > commmitSync which triggers a hear beat request and a successful
>> > > >> response is
>> > > >> > received for the same then what does this response means? does it
>> > mean
>> > > >> that
>> > > >> > my session with each broker is renewed? or does it mean that just
>> > the
>> > > >> > leader for partition of committed record knows that my consumer
>> is
>> > > alive
>> > > >> > and consumer's session on other brokers will still timeout?
>> > > >> >
>> > > >> > Regards,
>> > > >> > Vinay Sharma
>> > > >> >
>> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
>> > ja...@confluent.io>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Hey Vinay,
>> > > >> > >
>> > > >> > > Are you saying that heartbeats are not sent while a metadata
>> > refresh
>> > > >> is
>> > > >> > in
>> > > >> > > progress? Do you have any logs which show us the apparent
>> problem?
>> > > >> > >
>> > > >> > > Thanks,
>> > > >> > > Jason
>> > > >> > >
>> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
>> > > >> vinsharma.t...@gmail.com>
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > Hi Ismael,
>> > > >> > > >
>> > > >> > > > Treating commitSync as heartbeat will definitely resolve the
>> > issue
>> > > >> i am
>> > > >> > > > facing but the reason behind my issue does not seem to be
>> what
>> > > >> > mentioned
>> > > >> > > in
>> > > >> > > > defect (i.e frequent commitSync requests).
>> > > >> > > >
>> > > >> > > > I am sending CommitSync periodically only to keep my session
>> > alive
>> > > >> when
>> > > >> > > my
>> > > >> > > > consumer is still processing records and is close to session
>> > time
>> > > >> out
>> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll called
>> where
>> > > >> session
>> > > >> > > > time is 30). I see heartbeat response received in logs along
>> > with
>> > > >> each
>> > > >> > > > commitSync call but this stops after a meta data refresh
>> request
>> > > is
>> > > >> > > issued.
>> > > >> > > > I see in logs that commit goes successful but no heartbeat
>> > > response
>> > > >> > > > received message in logs after meta refresh till next poll.
>> > > >> > > >
>> > > >> > > > Regards,
>> > > >> > > > Vinay Sharma
>> > > >> > > >
>> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
>> ism...@juma.me.uk
>> > >
>> > > >> > wrote:
>> > > >> > > >
>> > > >> > > > > Hi Vinay,
>> > > >> > > > >
>> > > >> > > > > This was fixed via
>> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
>> > > >> > > > (will
>> > > >> > > > > be part of 0.10.0.0).
>> > > >> > > > >
>> > > >> > > > > Ismael
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
>> > > >> > > vinsharma.t...@gmail.com>
>> > > >> > > > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Hello,
>> > > >> > > > > >
>> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue. As
>> per my
>> > > >> logs
>> > > >> > it
>> > > >> > > > > seems
>> > > >> > > > > > that on each commitSync(Offsets) a heartbeat request is
>> sent
>> > > but
>> > > >> > > after
>> > > >> > > > a
>> > > >> > > > > > metada refresh request till next poll(), commits do not
>> send
>> > > any
>> > > >> > > > hearbeat
>> > > >> > > > > > request.
>> > > >> > > > > >
>> > > >> > > > > > KafkaConsumers i create sometimes get session time out
>> due
>> > to
>> > > no
>> > > >> > > > hearbeat
>> > > >> > > > > > specially during longer processing times. I call
>> > > >> > CommitSync(offsets)
>> > > >> > > > > after
>> > > >> > > > > > regular intervals to keep session alive when processing
>> > takes
>> > > >> > longer
>> > > >> > > > than
>> > > >> > > > > > usual. Every thing works fine if commit intervals are
>> very
>> > > >> small or
>> > > >> > > if
>> > > >> > > > i
>> > > >> > > > > > commit after each record but if i commit lets say every
>> 12
>> > > >> seconds
>> > > >> > > and
>> > > >> > > > 30
>> > > >> > > > > > seconds is session time then i can see consumer getting
>> > timed
>> > > >> out
>> > > >> > > > > > sometimes.
>> > > >> > > > > >
>> > > >> > > > > > Any help or pointers will be much appreciated. Thanks in
>> > > >> advance.
>> > > >> > > > > >
>> > > >> > > > > > Regards,
>> > > >> > > > > > Vinay sharma
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Re: No Heartbeat request on commit

Reply via email to