Re: No Heartbeat request on commit

vinay sharma Thu, 28 Apr 2016 14:56:24 -0700

I sent it as attachment. I did not zip it and may have been rejected by
mail server. Will send again shortly.
On Apr 28, 2016 5:11 PM, "Jason Gustafson" <[email protected]> wrote:


> Hey Vinay,
>
> Did you forget to attach the simple class?
>
> -Jason
>
> On Thu, Apr 28, 2016 at 1:14 PM, vinay sharma <[email protected]>
> wrote:
>
> > I was also wondering that if commitSync acts as heartbeat then why do we
> > still trigger heartbeat request on commit? why not just reset its time on
> > successful commitSync? or am i wrong and we do this already?
> >
> > Regards,
> > Vinay
> >
> > On Thu, Apr 28, 2016 at 3:38 PM, vinay sharma <[email protected]>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > Attached is a simple class with a main method. I used this for
> > reproducing
> > > issue and generate logs that i attached earlier. This class has code
> > > snippets of poller relevant to the issue.
> > >
> > > Regards,
> > > Vinay Sharma
> > >
> > > On Thu, Apr 28, 2016 at 3:30 PM, Jason Gustafson <[email protected]>
> > > wrote:
> > >
> > >> Hey Vinay,
> > >>
> > >> Thanks, that's really helpful. It does seem like there might be a
> > problem
> > >> with the heartbeat trigger logic. I'll see if I can reproduce what
> > you're
> > >> seeing locally. Might be helpful if you share a snippet of your poll
> > loop.
> > >>
> > >> Thanks,
> > >> Jason
> > >>
> > >> On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma <
> > [email protected]>
> > >> wrote:
> > >>
> > >> > Hi Jason,
> > >> >
> > >> > i reverted back to KAFKA-3149. Producer still had issues related to
> > >> schema
> > >> > but my consumer worked.
> > >> >
> > >> > Now consumer worked as expected. Although i did not encountered an
> > error
> > >> > and generation was not marked dead by coordinator but i still see
> that
> > >> > successful heartbeat response are not logged as expected.
> > >> > My observation is following:-
> > >> > 1) Meta refresh also triggers heartbeat request. I say this because
> > >> > sometimes i see 2 heartbeat responses logged just a few milliseconds
> > >> away
> > >> > where meta refresh and proactive commit happened almost
> > simultaneously.
> > >> > 2) I still see that some commitSync requests do not have a heartbeat
> > >> > logged before or after commit. Although next proactive commit
> happened
> > >> just
> > >> > in time and this time heartbeat request was successful hence saved
> > >> session.
> > >> > In attached log you can see that poll was done at 14:17:41, a commit
> > >> > happened at 14:17:56 and another commit happened at 14:18:14. The
> only
> > >> > heart beat response logged during this time is at 14:18:14 which is
> 29
> > >> > seconds after poll where as a commit was performed 15 seconds after
> > >> poll.
> > >> > Heartbeat interval was 3000.
> > >> > 3) There are long pauses in heartbeat responses in logs which should
> > >> cause
> > >> > session to timeout but its not happening. This implies that commits
> > >> trigger
> > >> > a heartbeat but they also act as heartbeat.
> > >> >
> > >> >
> > >> > Regards,
> > >> > Vinay
> > >> >
> > >> >
> > >> > On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson <
> [email protected]
> > >
> > >> > wrote:
> > >> >
> > >> >> Ah, yeah. That's probably caused by the new topic metadata version,
> > >> which
> > >> >> isn't supported on 0.9 brokers. To test on trunk, you'd have to
> > upgrade
> > >> >> the
> > >> >> brokers as well. Either that or you can rewind to before KAFKA-3306
> > >> (which
> > >> >> was just committed the day before yesterday)?
> > >> >>
> > >> >> -Jason
> > >> >>
> > >> >> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma <
> > >> [email protected]>
> > >> >> wrote:
> > >> >>
> > >> >> > Hi Jason,
> > >> >> >
> > >> >> > I build kafka-client and tried using it but my producers and
> > >> consumers
> > >> >> > started throwing below exception. Is 0.10 not going to be
> > compatible
> > >> >> with
> > >> >> > brokers on version 0.9.0.1? or do i need to make some config
> > changes
> > >> to
> > >> >> > producers / consumers to make them compatible with brokers on old
> > >> >> version?
> > >> >> > or do i need to upgrade brokers to new version as well?
> > >> >> >
> > >> >> >  org.apache.kafka.common.protocol.types.SchemaException: Error
> > >> reading
> > >> >> > field 'brokers': Error reading field 'host': Error reading string
> > of
> > >> >> length
> > >> >> > 17995, only 145 bytes available
> > >> >> > at
> > org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> > >> >> > at
> > >> >> >
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
> > >> >> > at
> > >> >> >
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
> > >> >> >
> > >> >> > Regards,
> > >> >> > Vinay Sharma
> > >> >> >
> > >> >> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson <
> > >> [email protected]>
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hey Vinay,
> > >> >> > >
> > >> >> > > Any chance you can run the same test against trunk? I'm
> guessing
> > >> this
> > >> >> > might
> > >> >> > > be caused by a bug in the 0.9 consumer which basically causes
> > some
> > >> >> > requests
> > >> >> > > to fail when a bunch of them are sent to the broker at the same
> > >> time.
> > >> >> > >
> > >> >> > > -Jason
> > >> >> > >
> > >> >> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma <
> > >> >> [email protected]>
> > >> >> > > wrote:
> > >> >> > >
> > >> >> > > > Hi Jason,
> > >> >> > > >
> > >> >> > > > This makes sense.We use 0.9.0.1 and we do have session
> timeout
> > >> set a
> > >> >> > bit
> > >> >> > > > high but nothing can guarantee that there will be no case
> when
> > >> >> > processing
> > >> >> > > > may not go higher than session timeout. I am trying to test a
> > >> >> proactive
> > >> >> > > > commit approach to handle such cases when processing takes
> > >> unusually
> > >> >> > long
> > >> >> > > > time. To keep consumer's session alive during long processing
> > >> time i
> > >> >> > > > proactively commitSync processed records every 15 seconds.
> > >> Session
> > >> >> > > timeout
> > >> >> > > > i kept is 30000.
> > >> >> > > >
> > >> >> > > > *Problem:-*
> > >> >> > > > With heart beat interval is 3000 then i expect a hearbeat
> > request
> > >> >> to be
> > >> >> > > > sent on each proactive commit which happens every 15 seconds.
> > In
> > >> my
> > >> >> > > tests i
> > >> >> > > > see that this does not happen always. I see a time window
> which
> > >> is
> > >> >> > > greater
> > >> >> > > > than 30 seconds where no hearbeat is sent even thought there
> > were
> > >> >> > commits
> > >> >> > > > in this duration. After this window i see a couple of
> > successful
> > >> >> > > heartbeat
> > >> >> > > > responses till the end of poll but as soon as i poll again
> and
> > >> call
> > >> >> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error.
> This
> > >> error
> > >> >> > > always
> > >> >> > > > happen just after meta refresh or in next poll processing
> > after a
> > >> >> meta
> > >> >> > > > refresh. I am attaching logs where i kept meta refresh
> interval
> > >> >> 40000,
> > >> >> > > > 90000, 500000.
> > >> >> > > >
> > >> >> > > > *Test results *:-
> > >> >> > > > Test with meta refresh 40000 ms ran around 70 seconds from
> 1st
> > >> poll.
> > >> >> > > > Test with meta refresh 90000 ms ran around 120 seconds from
> 1st
> > >> >> poll.
> > >> >> > > > Test with meta refresh 500000 ms ran around 564 seconds from
> > 1st
> > >> >> poll.
> > >> >> > > >
> > >> >> > > > Every test falls in line with above test cases where
> generation
> > >> is
> > >> >> > marked
> > >> >> > > > dead some time after a meta refresh. Meta refresh before 1st
> > poll
> > >> >> does
> > >> >> > > not
> > >> >> > > > create any issue but the ones after poll and during long
> > >> processing
> > >> >> do.
> > >> >> > > >
> > >> >> > > > *Environment:-*
> > >> >> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has
> > >> >> replication
> > >> >> > > > factor 3. Messages are already published to topic.
> > >> >> > > >
> > >> >> > > > *Logic used in test cases :- *
> > >> >> > > > On each poll I initialize a map with current committed offset
> > >> >> position
> > >> >> > of
> > >> >> > > > partitions being consumed. I update this map after each
> record
> > >> >> > processing
> > >> >> > > > and use this map to proactively commit every 15 seconds. Map
> is
> > >> >> > > initialized
> > >> >> > > > again after a proactive commit.
> > >> >> > > >
> > >> >> > > > I am not sure what is wrong here but i do not see any issue
> in
> > >> code
> > >> >> or
> > >> >> > > > offset commits going on. Log files and a class with main
> method
> > >> are
> > >> >> > > > attached for your reference.
> > >> >> > > >
> > >> >> > > > Regards,
> > >> >> > > > Vinay Sharma
> > >> >> > > >
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson <
> > >> >> [email protected]>
> > >> >> > > > wrote:
> > >> >> > > >
> > >> >> > > >> Hi Vinay,
> > >> >> > > >>
> > >> >> > > >> Answers below:
> > >> >> > > >>
> > >> >> > > >> 1)  Is it correct to say that each commitSync will trigger a
> > >> >> > > >> HeartBeatTask?
> > >> >> > > >> > If there is no hear beat sent in past since specified
> > >> heartbeat
> > >> >> > > interval
> > >> >> > > >> > then i should see a successful heartbeat response or
> failure
> > >> >> message
> > >> >> > > in
> > >> >> > > >> > logs near to commitSync success log?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> Not quite. Heartbeats are sent periodically according to the
> > >> >> > > >> heartbeat.interval.ms configuration. However, since the
> > >> consumer
> > >> >> has
> > >> >> > no
> > >> >> > > >> background thread, they can only be sent in API calls such
> as
> > >> >> poll()
> > >> >> > or
> > >> >> > > >> commitSync(). So calling commitSync() may or may not result
> > in a
> > >> >> > > heartbeat
> > >> >> > > >> depending only on whether one is "due."
> > >> >> > > >>
> > >> >> > > >> 2) is it correct to say that Meta Data refresh will not act
> as
> > >> >> > > heartbeat,
> > >> >> > > >> > will not trigger heartBeatTask and will not reset
> > >> heartBeatTask?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> That is correct. Metadata refreshes are not related to
> > >> heartbeats.
> > >> >> > > >>
> > >> >> > > >> 3) Where does a consumer session maintained? Lets say my
> > >> consumer
> > >> >> is
> > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> > >> >> broker is
> > >> >> > > >> leader
> > >> >> > > >> > of 1 partition. So will each of the brokers will have a
> > >> session
> > >> >> for
> > >> >> > my
> > >> >> > > >> > consumer or is it just 1 session maintained somewhere in
> > >> common
> > >> >> like
> > >> >> > > >> > zookeeper?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> One of the brokers serves as the "group coordinator." When
> the
> > >> >> > consumer
> > >> >> > > >> starts up, it sends a GroupCoordinator request to one of the
> > >> >> brokers
> > >> >> > to
> > >> >> > > >> find out who the coordinator is. Currently, coordinators are
> > >> chosen
> > >> >> > from
> > >> >> > > >> among the leaders of the partitions of the
> __consumer_offsets
> > >> >> topic.
> > >> >> > > This
> > >> >> > > >> lets us take advantage of the leader election process to
> also
> > >> >> handle
> > >> >> > > >> coordinator failures. The coordinator of each group
> maintains
> > >> state
> > >> >> > for
> > >> >> > > >> the
> > >> >> > > >> group and keeps track of session timeouts.
> > >> >> > > >>
> > >> >> > > >> 4) In above setup, during a long processing if I commit a
> > record
> > >> >> > through
> > >> >> > > >> > commmitSync which triggers a hear beat request and a
> > >> successful
> > >> >> > > >> response is
> > >> >> > > >> > received for the same then what does this response means?
> > >> does it
> > >> >> > mean
> > >> >> > > >> that
> > >> >> > > >> > my session with each broker is renewed? or does it mean
> that
> > >> just
> > >> >> > the
> > >> >> > > >> > leader for partition of committed record knows that my
> > >> consumer
> > >> >> is
> > >> >> > > alive
> > >> >> > > >> > and consumer's session on other brokers will still
> timeout?
> > >> >> > > >>
> > >> >> > > >>
> > >> >> > > >> The coordinator is the only broker that is aware of a
> > consumer's
> > >> >> > session
> > >> >> > > >> and all offset commits are sent to it. Successful heartbeats
> > >> mean
> > >> >> that
> > >> >> > > the
> > >> >> > > >> session is still active. Heartbeats are also used to let the
> > >> >> consumer
> > >> >> > > >> discover when a rebalance has begun. If a new member joins
> the
> > >> >> group,
> > >> >> > > then
> > >> >> > > >> the coordinator returns an error code in the heartbeat
> > >> responses of
> > >> >> > the
> > >> >> > > >> active members to let them know that they need to rejoin the
> > >> group
> > >> >> so
> > >> >> > > that
> > >> >> > > >> partitions can be rebalanced.
> > >> >> > > >>
> > >> >> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The
> > >> crux
> > >> >> of
> > >> >> > the
> > >> >> > > >> issue is that you need to call poll() often enough to avoid
> > >> getting
> > >> >> > > timed
> > >> >> > > >> out by the coordinator. If you find this happening
> frequently,
> > >> you
> > >> >> > > >> probably
> > >> >> > > >> need to increase session.timeout.ms. There's not really any
> > >> >> downside
> > >> >> > to
> > >> >> > > >> doing so other than that hard failures (in which the
> consumer
> > >> >> can't be
> > >> >> > > >> shutdown cleanly) will take a little longer to detect.
> Normal
> > >> >> shutdown
> > >> >> > > >> doesn't have this problem. It can be difficult in 0.9 to
> > ensure
> > >> >> that
> > >> >> > > >> poll()
> > >> >> > > >> is called often enough since you don't have direct control
> > over
> > >> the
> > >> >> > > amount
> > >> >> > > >> of data returned in poll(), but we're adding an option
> > >> >> > > (max.poll.records)
> > >> >> > > >> in 0.10 which hopefully can be set conservatively enough to
> > make
> > >> >> this
> > >> >> > > >> problem go away.
> > >> >> > > >>
> > >> >> > > >> -Jason
> > >> >> > > >>
> > >> >> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma <
> > >> >> > [email protected]
> > >> >> > > >
> > >> >> > > >> wrote:
> > >> >> > > >>
> > >> >> > > >> > Hey,
> > >> >> > > >> >
> > >> >> > > >> > I am working on a simplified test case to check if there
> is
> > >> any
> > >> >> > issue
> > >> >> > > >> in my
> > >> >> > > >> > code. Just to make sure that any of my assumptions are not
> > >> >> wrong, it
> > >> >> > > >> will
> > >> >> > > >> > be great if you can please help me in finding answers to
> > >> >> following
> > >> >> > > >> > queries:-
> > >> >> > > >> >
> > >> >> > > >> > 1)  Is it correct to say that each commitSync will
> trigger a
> > >> >> > > >> HeartBeatTask?
> > >> >> > > >> > If there is no hear beat sent in past since specified
> > >> heartbeat
> > >> >> > > interval
> > >> >> > > >> > then i should see a successful heartbeat response or
> failure
> > >> >> message
> > >> >> > > in
> > >> >> > > >> > logs near to commitSync success log?
> > >> >> > > >> > 2) is it correct to say that Meta Data refresh will not
> act
> > as
> > >> >> > > >> heartbeat,
> > >> >> > > >> > will not trigger heartBeatTask and will not reset
> > >> heartBeatTask?
> > >> >> > > >> > 3) Where does a consumer session maintained? Lets say my
> > >> >> consumer is
> > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each
> > >> >> broker is
> > >> >> > > >> leader
> > >> >> > > >> > of 1 partition. So will each of the brokers will have a
> > >> session
> > >> >> for
> > >> >> > my
> > >> >> > > >> > consumer or is it just 1 session maintained somewhere in
> > >> common
> > >> >> like
> > >> >> > > >> > zookeeper?
> > >> >> > > >> > 4) In above setup, during a long processing if I commit a
> > >> record
> > >> >> > > through
> > >> >> > > >> > commmitSync which triggers a hear beat request and a
> > >> successful
> > >> >> > > >> response is
> > >> >> > > >> > received for the same then what does this response means?
> > >> does it
> > >> >> > mean
> > >> >> > > >> that
> > >> >> > > >> > my session with each broker is renewed? or does it mean
> that
> > >> just
> > >> >> > the
> > >> >> > > >> > leader for partition of committed record knows that my
> > >> consumer
> > >> >> is
> > >> >> > > alive
> > >> >> > > >> > and consumer's session on other brokers will still
> timeout?
> > >> >> > > >> >
> > >> >> > > >> > Regards,
> > >> >> > > >> > Vinay Sharma
> > >> >> > > >> >
> > >> >> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson <
> > >> >> > [email protected]>
> > >> >> > > >> > wrote:
> > >> >> > > >> >
> > >> >> > > >> > > Hey Vinay,
> > >> >> > > >> > >
> > >> >> > > >> > > Are you saying that heartbeats are not sent while a
> > metadata
> > >> >> > refresh
> > >> >> > > >> is
> > >> >> > > >> > in
> > >> >> > > >> > > progress? Do you have any logs which show us the
> apparent
> > >> >> problem?
> > >> >> > > >> > >
> > >> >> > > >> > > Thanks,
> > >> >> > > >> > > Jason
> > >> >> > > >> > >
> > >> >> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma <
> > >> >> > > >> [email protected]>
> > >> >> > > >> > > wrote:
> > >> >> > > >> > >
> > >> >> > > >> > > > Hi Ismael,
> > >> >> > > >> > > >
> > >> >> > > >> > > > Treating commitSync as heartbeat will definitely
> resolve
> > >> the
> > >> >> > issue
> > >> >> > > >> i am
> > >> >> > > >> > > > facing but the reason behind my issue does not seem to
> > be
> > >> >> what
> > >> >> > > >> > mentioned
> > >> >> > > >> > > in
> > >> >> > > >> > > > defect (i.e frequent commitSync requests).
> > >> >> > > >> > > >
> > >> >> > > >> > > > I am sending CommitSync periodically only to keep my
> > >> session
> > >> >> > alive
> > >> >> > > >> when
> > >> >> > > >> > > my
> > >> >> > > >> > > > consumer is still processing records and is close to
> > >> session
> > >> >> > time
> > >> >> > > >> out
> > >> >> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll
> > called
> > >> >> where
> > >> >> > > >> session
> > >> >> > > >> > > > time is 30). I see heartbeat response received in logs
> > >> along
> > >> >> > with
> > >> >> > > >> each
> > >> >> > > >> > > > commitSync call but this stops after a meta data
> refresh
> > >> >> request
> > >> >> > > is
> > >> >> > > >> > > issued.
> > >> >> > > >> > > > I see in logs that commit goes successful but no
> > heartbeat
> > >> >> > > response
> > >> >> > > >> > > > received message in logs after meta refresh till next
> > >> poll.
> > >> >> > > >> > > >
> > >> >> > > >> > > > Regards,
> > >> >> > > >> > > > Vinay Sharma
> > >> >> > > >> > > >
> > >> >> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma <
> > >> >> [email protected]
> > >> >> > >
> > >> >> > > >> > wrote:
> > >> >> > > >> > > >
> > >> >> > > >> > > > > Hi Vinay,
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > This was fixed via
> > >> >> > > >> https://issues.apache.org/jira/browse/KAFKA-3470
> > >> >> > > >> > > > (will
> > >> >> > > >> > > > > be part of 0.10.0.0).
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > Ismael
> > >> >> > > >> > > > >
> > >> >> > > >> > > > >
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma <
> > >> >> > > >> > > [email protected]>
> > >> >> > > >> > > > > wrote:
> > >> >> > > >> > > > >
> > >> >> > > >> > > > > > Hello,
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue.
> > As
> > >> >> per my
> > >> >> > > >> logs
> > >> >> > > >> > it
> > >> >> > > >> > > > > seems
> > >> >> > > >> > > > > > that on each commitSync(Offsets) a heartbeat
> request
> > >> is
> > >> >> sent
> > >> >> > > but
> > >> >> > > >> > > after
> > >> >> > > >> > > > a
> > >> >> > > >> > > > > > metada refresh request till next poll(), commits
> do
> > >> not
> > >> >> send
> > >> >> > > any
> > >> >> > > >> > > > hearbeat
> > >> >> > > >> > > > > > request.
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > KafkaConsumers i create sometimes get session time
> > out
> > >> >> due
> > >> >> > to
> > >> >> > > no
> > >> >> > > >> > > > hearbeat
> > >> >> > > >> > > > > > specially during longer processing times. I call
> > >> >> > > >> > CommitSync(offsets)
> > >> >> > > >> > > > > after
> > >> >> > > >> > > > > > regular intervals to keep session alive when
> > >> processing
> > >> >> > takes
> > >> >> > > >> > longer
> > >> >> > > >> > > > than
> > >> >> > > >> > > > > > usual. Every thing works fine if commit intervals
> > are
> > >> >> very
> > >> >> > > >> small or
> > >> >> > > >> > > if
> > >> >> > > >> > > > i
> > >> >> > > >> > > > > > commit after each record but if i commit lets say
> > >> every
> > >> >> 12
> > >> >> > > >> seconds
> > >> >> > > >> > > and
> > >> >> > > >> > > > 30
> > >> >> > > >> > > > > > seconds is session time then i can see consumer
> > >> getting
> > >> >> > timed
> > >> >> > > >> out
> > >> >> > > >> > > > > > sometimes.
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > Any help or pointers will be much appreciated.
> > Thanks
> > >> in
> > >> >> > > >> advance.
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > > > Regards,
> > >> >> > > >> > > > > > Vinay sharma
> > >> >> > > >> > > > > >
> > >> >> > > >> > > > >
> > >> >> > > >> > > >
> > >> >> > > >> > >
> > >> >> > > >> >
> > >> >> > > >>
> > >> >> > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: No Heartbeat request on commit

Reply via email to