I sent it as attachment. I did not zip it and may have been rejected by mail server. Will send again shortly. On Apr 28, 2016 5:11 PM, "Jason Gustafson" <ja...@confluent.io> wrote:
> Hey Vinay, > > Did you forget to attach the simple class? > > -Jason > > On Thu, Apr 28, 2016 at 1:14 PM, vinay sharma <vinsharma.t...@gmail.com> > wrote: > > > I was also wondering that if commitSync acts as heartbeat then why do we > > still trigger heartbeat request on commit? why not just reset its time on > > successful commitSync? or am i wrong and we do this already? > > > > Regards, > > Vinay > > > > On Thu, Apr 28, 2016 at 3:38 PM, vinay sharma <vinsharma.t...@gmail.com> > > wrote: > > > > > Hi Jason, > > > > > > Attached is a simple class with a main method. I used this for > > reproducing > > > issue and generate logs that i attached earlier. This class has code > > > snippets of poller relevant to the issue. > > > > > > Regards, > > > Vinay Sharma > > > > > > On Thu, Apr 28, 2016 at 3:30 PM, Jason Gustafson <ja...@confluent.io> > > > wrote: > > > > > >> Hey Vinay, > > >> > > >> Thanks, that's really helpful. It does seem like there might be a > > problem > > >> with the heartbeat trigger logic. I'll see if I can reproduce what > > you're > > >> seeing locally. Might be helpful if you share a snippet of your poll > > loop. > > >> > > >> Thanks, > > >> Jason > > >> > > >> On Thu, Apr 28, 2016 at 11:55 AM, vinay sharma < > > vinsharma.t...@gmail.com> > > >> wrote: > > >> > > >> > Hi Jason, > > >> > > > >> > i reverted back to KAFKA-3149. Producer still had issues related to > > >> schema > > >> > but my consumer worked. > > >> > > > >> > Now consumer worked as expected. Although i did not encountered an > > error > > >> > and generation was not marked dead by coordinator but i still see > that > > >> > successful heartbeat response are not logged as expected. > > >> > My observation is following:- > > >> > 1) Meta refresh also triggers heartbeat request. I say this because > > >> > sometimes i see 2 heartbeat responses logged just a few milliseconds > > >> away > > >> > where meta refresh and proactive commit happened almost > > simultaneously. > > >> > 2) I still see that some commitSync requests do not have a heartbeat > > >> > logged before or after commit. Although next proactive commit > happened > > >> just > > >> > in time and this time heartbeat request was successful hence saved > > >> session. > > >> > In attached log you can see that poll was done at 14:17:41, a commit > > >> > happened at 14:17:56 and another commit happened at 14:18:14. The > only > > >> > heart beat response logged during this time is at 14:18:14 which is > 29 > > >> > seconds after poll where as a commit was performed 15 seconds after > > >> poll. > > >> > Heartbeat interval was 3000. > > >> > 3) There are long pauses in heartbeat responses in logs which should > > >> cause > > >> > session to timeout but its not happening. This implies that commits > > >> trigger > > >> > a heartbeat but they also act as heartbeat. > > >> > > > >> > > > >> > Regards, > > >> > Vinay > > >> > > > >> > > > >> > On Thu, Apr 28, 2016 at 12:29 PM, Jason Gustafson < > ja...@confluent.io > > > > > >> > wrote: > > >> > > > >> >> Ah, yeah. That's probably caused by the new topic metadata version, > > >> which > > >> >> isn't supported on 0.9 brokers. To test on trunk, you'd have to > > upgrade > > >> >> the > > >> >> brokers as well. Either that or you can rewind to before KAFKA-3306 > > >> (which > > >> >> was just committed the day before yesterday)? > > >> >> > > >> >> -Jason > > >> >> > > >> >> On Thu, Apr 28, 2016 at 9:01 AM, vinay sharma < > > >> vinsharma.t...@gmail.com> > > >> >> wrote: > > >> >> > > >> >> > Hi Jason, > > >> >> > > > >> >> > I build kafka-client and tried using it but my producers and > > >> consumers > > >> >> > started throwing below exception. Is 0.10 not going to be > > compatible > > >> >> with > > >> >> > brokers on version 0.9.0.1? or do i need to make some config > > changes > > >> to > > >> >> > producers / consumers to make them compatible with brokers on old > > >> >> version? > > >> >> > or do i need to upgrade brokers to new version as well? > > >> >> > > > >> >> > org.apache.kafka.common.protocol.types.SchemaException: Error > > >> reading > > >> >> > field 'brokers': Error reading field 'host': Error reading string > > of > > >> >> length > > >> >> > 17995, only 145 bytes available > > >> >> > at > > org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75) > > >> >> > at > > >> >> > > > >> >> > > > >> >> > > >> > > > org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) > > >> >> > at > > >> >> > > > >> >> > > > >> >> > > >> > > > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) > > >> >> > > > >> >> > Regards, > > >> >> > Vinay Sharma > > >> >> > > > >> >> > On Thu, Apr 28, 2016 at 12:32 AM, Jason Gustafson < > > >> ja...@confluent.io> > > >> >> > wrote: > > >> >> > > > >> >> > > Hey Vinay, > > >> >> > > > > >> >> > > Any chance you can run the same test against trunk? I'm > guessing > > >> this > > >> >> > might > > >> >> > > be caused by a bug in the 0.9 consumer which basically causes > > some > > >> >> > requests > > >> >> > > to fail when a bunch of them are sent to the broker at the same > > >> time. > > >> >> > > > > >> >> > > -Jason > > >> >> > > > > >> >> > > On Wed, Apr 27, 2016 at 1:02 PM, vinay sharma < > > >> >> vinsharma.t...@gmail.com> > > >> >> > > wrote: > > >> >> > > > > >> >> > > > Hi Jason, > > >> >> > > > > > >> >> > > > This makes sense.We use 0.9.0.1 and we do have session > timeout > > >> set a > > >> >> > bit > > >> >> > > > high but nothing can guarantee that there will be no case > when > > >> >> > processing > > >> >> > > > may not go higher than session timeout. I am trying to test a > > >> >> proactive > > >> >> > > > commit approach to handle such cases when processing takes > > >> unusually > > >> >> > long > > >> >> > > > time. To keep consumer's session alive during long processing > > >> time i > > >> >> > > > proactively commitSync processed records every 15 seconds. > > >> Session > > >> >> > > timeout > > >> >> > > > i kept is 30000. > > >> >> > > > > > >> >> > > > *Problem:-* > > >> >> > > > With heart beat interval is 3000 then i expect a hearbeat > > request > > >> >> to be > > >> >> > > > sent on each proactive commit which happens every 15 seconds. > > In > > >> my > > >> >> > > tests i > > >> >> > > > see that this does not happen always. I see a time window > which > > >> is > > >> >> > > greater > > >> >> > > > than 30 seconds where no hearbeat is sent even thought there > > were > > >> >> > commits > > >> >> > > > in this duration. After this window i see a couple of > > successful > > >> >> > > heartbeat > > >> >> > > > responses till the end of poll but as soon as i poll again > and > > >> call > > >> >> > > > commitSync in next poll i get "ILLEGAL_GENERATION" error. > This > > >> error > > >> >> > > always > > >> >> > > > happen just after meta refresh or in next poll processing > > after a > > >> >> meta > > >> >> > > > refresh. I am attaching logs where i kept meta refresh > interval > > >> >> 40000, > > >> >> > > > 90000, 500000. > > >> >> > > > > > >> >> > > > *Test results *:- > > >> >> > > > Test with meta refresh 40000 ms ran around 70 seconds from > 1st > > >> poll. > > >> >> > > > Test with meta refresh 90000 ms ran around 120 seconds from > 1st > > >> >> poll. > > >> >> > > > Test with meta refresh 500000 ms ran around 564 seconds from > > 1st > > >> >> poll. > > >> >> > > > > > >> >> > > > Every test falls in line with above test cases where > generation > > >> is > > >> >> > marked > > >> >> > > > dead some time after a meta refresh. Meta refresh before 1st > > poll > > >> >> does > > >> >> > > not > > >> >> > > > create any issue but the ones after poll and during long > > >> processing > > >> >> do. > > >> >> > > > > > >> >> > > > *Environment:-* > > >> >> > > > My setup has 3 brokers 1 zk. Topic has 3 partitions ans has > > >> >> replication > > >> >> > > > factor 3. Messages are already published to topic. > > >> >> > > > > > >> >> > > > *Logic used in test cases :- * > > >> >> > > > On each poll I initialize a map with current committed offset > > >> >> position > > >> >> > of > > >> >> > > > partitions being consumed. I update this map after each > record > > >> >> > processing > > >> >> > > > and use this map to proactively commit every 15 seconds. Map > is > > >> >> > > initialized > > >> >> > > > again after a proactive commit. > > >> >> > > > > > >> >> > > > I am not sure what is wrong here but i do not see any issue > in > > >> code > > >> >> or > > >> >> > > > offset commits going on. Log files and a class with main > method > > >> are > > >> >> > > > attached for your reference. > > >> >> > > > > > >> >> > > > Regards, > > >> >> > > > Vinay Sharma > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > On Wed, Apr 27, 2016 at 2:46 PM, Jason Gustafson < > > >> >> ja...@confluent.io> > > >> >> > > > wrote: > > >> >> > > > > > >> >> > > >> Hi Vinay, > > >> >> > > >> > > >> >> > > >> Answers below: > > >> >> > > >> > > >> >> > > >> 1) Is it correct to say that each commitSync will trigger a > > >> >> > > >> HeartBeatTask? > > >> >> > > >> > If there is no hear beat sent in past since specified > > >> heartbeat > > >> >> > > interval > > >> >> > > >> > then i should see a successful heartbeat response or > failure > > >> >> message > > >> >> > > in > > >> >> > > >> > logs near to commitSync success log? > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> Not quite. Heartbeats are sent periodically according to the > > >> >> > > >> heartbeat.interval.ms configuration. However, since the > > >> consumer > > >> >> has > > >> >> > no > > >> >> > > >> background thread, they can only be sent in API calls such > as > > >> >> poll() > > >> >> > or > > >> >> > > >> commitSync(). So calling commitSync() may or may not result > > in a > > >> >> > > heartbeat > > >> >> > > >> depending only on whether one is "due." > > >> >> > > >> > > >> >> > > >> 2) is it correct to say that Meta Data refresh will not act > as > > >> >> > > heartbeat, > > >> >> > > >> > will not trigger heartBeatTask and will not reset > > >> heartBeatTask? > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> That is correct. Metadata refreshes are not related to > > >> heartbeats. > > >> >> > > >> > > >> >> > > >> 3) Where does a consumer session maintained? Lets say my > > >> consumer > > >> >> is > > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each > > >> >> broker is > > >> >> > > >> leader > > >> >> > > >> > of 1 partition. So will each of the brokers will have a > > >> session > > >> >> for > > >> >> > my > > >> >> > > >> > consumer or is it just 1 session maintained somewhere in > > >> common > > >> >> like > > >> >> > > >> > zookeeper? > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> One of the brokers serves as the "group coordinator." When > the > > >> >> > consumer > > >> >> > > >> starts up, it sends a GroupCoordinator request to one of the > > >> >> brokers > > >> >> > to > > >> >> > > >> find out who the coordinator is. Currently, coordinators are > > >> chosen > > >> >> > from > > >> >> > > >> among the leaders of the partitions of the > __consumer_offsets > > >> >> topic. > > >> >> > > This > > >> >> > > >> lets us take advantage of the leader election process to > also > > >> >> handle > > >> >> > > >> coordinator failures. The coordinator of each group > maintains > > >> state > > >> >> > for > > >> >> > > >> the > > >> >> > > >> group and keeps track of session timeouts. > > >> >> > > >> > > >> >> > > >> 4) In above setup, during a long processing if I commit a > > record > > >> >> > through > > >> >> > > >> > commmitSync which triggers a hear beat request and a > > >> successful > > >> >> > > >> response is > > >> >> > > >> > received for the same then what does this response means? > > >> does it > > >> >> > mean > > >> >> > > >> that > > >> >> > > >> > my session with each broker is renewed? or does it mean > that > > >> just > > >> >> > the > > >> >> > > >> > leader for partition of committed record knows that my > > >> consumer > > >> >> is > > >> >> > > alive > > >> >> > > >> > and consumer's session on other brokers will still > timeout? > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> The coordinator is the only broker that is aware of a > > consumer's > > >> >> > session > > >> >> > > >> and all offset commits are sent to it. Successful heartbeats > > >> mean > > >> >> that > > >> >> > > the > > >> >> > > >> session is still active. Heartbeats are also used to let the > > >> >> consumer > > >> >> > > >> discover when a rebalance has begun. If a new member joins > the > > >> >> group, > > >> >> > > then > > >> >> > > >> the coordinator returns an error code in the heartbeat > > >> responses of > > >> >> > the > > >> >> > > >> active members to let them know that they need to rejoin the > > >> group > > >> >> so > > >> >> > > that > > >> >> > > >> partitions can be rebalanced. > > >> >> > > >> > > >> >> > > >> I wouldn't get too hung up on commit/heartbeat behavior. The > > >> crux > > >> >> of > > >> >> > the > > >> >> > > >> issue is that you need to call poll() often enough to avoid > > >> getting > > >> >> > > timed > > >> >> > > >> out by the coordinator. If you find this happening > frequently, > > >> you > > >> >> > > >> probably > > >> >> > > >> need to increase session.timeout.ms. There's not really any > > >> >> downside > > >> >> > to > > >> >> > > >> doing so other than that hard failures (in which the > consumer > > >> >> can't be > > >> >> > > >> shutdown cleanly) will take a little longer to detect. > Normal > > >> >> shutdown > > >> >> > > >> doesn't have this problem. It can be difficult in 0.9 to > > ensure > > >> >> that > > >> >> > > >> poll() > > >> >> > > >> is called often enough since you don't have direct control > > over > > >> the > > >> >> > > amount > > >> >> > > >> of data returned in poll(), but we're adding an option > > >> >> > > (max.poll.records) > > >> >> > > >> in 0.10 which hopefully can be set conservatively enough to > > make > > >> >> this > > >> >> > > >> problem go away. > > >> >> > > >> > > >> >> > > >> -Jason > > >> >> > > >> > > >> >> > > >> On Wed, Apr 27, 2016 at 7:11 AM, vinay sharma < > > >> >> > vinsharma.t...@gmail.com > > >> >> > > > > > >> >> > > >> wrote: > > >> >> > > >> > > >> >> > > >> > Hey, > > >> >> > > >> > > > >> >> > > >> > I am working on a simplified test case to check if there > is > > >> any > > >> >> > issue > > >> >> > > >> in my > > >> >> > > >> > code. Just to make sure that any of my assumptions are not > > >> >> wrong, it > > >> >> > > >> will > > >> >> > > >> > be great if you can please help me in finding answers to > > >> >> following > > >> >> > > >> > queries:- > > >> >> > > >> > > > >> >> > > >> > 1) Is it correct to say that each commitSync will > trigger a > > >> >> > > >> HeartBeatTask? > > >> >> > > >> > If there is no hear beat sent in past since specified > > >> heartbeat > > >> >> > > interval > > >> >> > > >> > then i should see a successful heartbeat response or > failure > > >> >> message > > >> >> > > in > > >> >> > > >> > logs near to commitSync success log? > > >> >> > > >> > 2) is it correct to say that Meta Data refresh will not > act > > as > > >> >> > > >> heartbeat, > > >> >> > > >> > will not trigger heartBeatTask and will not reset > > >> heartBeatTask? > > >> >> > > >> > 3) Where does a consumer session maintained? Lets say my > > >> >> consumer is > > >> >> > > >> > listening to 3 partitions on a 3 broker cluster where each > > >> >> broker is > > >> >> > > >> leader > > >> >> > > >> > of 1 partition. So will each of the brokers will have a > > >> session > > >> >> for > > >> >> > my > > >> >> > > >> > consumer or is it just 1 session maintained somewhere in > > >> common > > >> >> like > > >> >> > > >> > zookeeper? > > >> >> > > >> > 4) In above setup, during a long processing if I commit a > > >> record > > >> >> > > through > > >> >> > > >> > commmitSync which triggers a hear beat request and a > > >> successful > > >> >> > > >> response is > > >> >> > > >> > received for the same then what does this response means? > > >> does it > > >> >> > mean > > >> >> > > >> that > > >> >> > > >> > my session with each broker is renewed? or does it mean > that > > >> just > > >> >> > the > > >> >> > > >> > leader for partition of committed record knows that my > > >> consumer > > >> >> is > > >> >> > > alive > > >> >> > > >> > and consumer's session on other brokers will still > timeout? > > >> >> > > >> > > > >> >> > > >> > Regards, > > >> >> > > >> > Vinay Sharma > > >> >> > > >> > > > >> >> > > >> > On Tue, Apr 26, 2016 at 2:38 PM, Jason Gustafson < > > >> >> > ja...@confluent.io> > > >> >> > > >> > wrote: > > >> >> > > >> > > > >> >> > > >> > > Hey Vinay, > > >> >> > > >> > > > > >> >> > > >> > > Are you saying that heartbeats are not sent while a > > metadata > > >> >> > refresh > > >> >> > > >> is > > >> >> > > >> > in > > >> >> > > >> > > progress? Do you have any logs which show us the > apparent > > >> >> problem? > > >> >> > > >> > > > > >> >> > > >> > > Thanks, > > >> >> > > >> > > Jason > > >> >> > > >> > > > > >> >> > > >> > > On Tue, Apr 26, 2016 at 8:18 AM, vinay sharma < > > >> >> > > >> vinsharma.t...@gmail.com> > > >> >> > > >> > > wrote: > > >> >> > > >> > > > > >> >> > > >> > > > Hi Ismael, > > >> >> > > >> > > > > > >> >> > > >> > > > Treating commitSync as heartbeat will definitely > resolve > > >> the > > >> >> > issue > > >> >> > > >> i am > > >> >> > > >> > > > facing but the reason behind my issue does not seem to > > be > > >> >> what > > >> >> > > >> > mentioned > > >> >> > > >> > > in > > >> >> > > >> > > > defect (i.e frequent commitSync requests). > > >> >> > > >> > > > > > >> >> > > >> > > > I am sending CommitSync periodically only to keep my > > >> session > > >> >> > alive > > >> >> > > >> when > > >> >> > > >> > > my > > >> >> > > >> > > > consumer is still processing records and is close to > > >> session > > >> >> > time > > >> >> > > >> out > > >> >> > > >> > > > (tried 10th / 12th / 15th / 20th second after poll > > called > > >> >> where > > >> >> > > >> session > > >> >> > > >> > > > time is 30). I see heartbeat response received in logs > > >> along > > >> >> > with > > >> >> > > >> each > > >> >> > > >> > > > commitSync call but this stops after a meta data > refresh > > >> >> request > > >> >> > > is > > >> >> > > >> > > issued. > > >> >> > > >> > > > I see in logs that commit goes successful but no > > heartbeat > > >> >> > > response > > >> >> > > >> > > > received message in logs after meta refresh till next > > >> poll. > > >> >> > > >> > > > > > >> >> > > >> > > > Regards, > > >> >> > > >> > > > Vinay Sharma > > >> >> > > >> > > > > > >> >> > > >> > > > On Mon, Apr 25, 2016 at 5:06 PM, Ismael Juma < > > >> >> ism...@juma.me.uk > > >> >> > > > > >> >> > > >> > wrote: > > >> >> > > >> > > > > > >> >> > > >> > > > > Hi Vinay, > > >> >> > > >> > > > > > > >> >> > > >> > > > > This was fixed via > > >> >> > > >> https://issues.apache.org/jira/browse/KAFKA-3470 > > >> >> > > >> > > > (will > > >> >> > > >> > > > > be part of 0.10.0.0). > > >> >> > > >> > > > > > > >> >> > > >> > > > > Ismael > > >> >> > > >> > > > > > > >> >> > > >> > > > > > > >> >> > > >> > > > > > > >> >> > > >> > > > > On Mon, Apr 25, 2016 at 1:52 PM, vinay sharma < > > >> >> > > >> > > vinsharma.t...@gmail.com> > > >> >> > > >> > > > > wrote: > > >> >> > > >> > > > > > > >> >> > > >> > > > > > Hello, > > >> >> > > >> > > > > > > > >> >> > > >> > > > > > I am using client API 0.9.0.1 and facing an issue. > > As > > >> >> per my > > >> >> > > >> logs > > >> >> > > >> > it > > >> >> > > >> > > > > seems > > >> >> > > >> > > > > > that on each commitSync(Offsets) a heartbeat > request > > >> is > > >> >> sent > > >> >> > > but > > >> >> > > >> > > after > > >> >> > > >> > > > a > > >> >> > > >> > > > > > metada refresh request till next poll(), commits > do > > >> not > > >> >> send > > >> >> > > any > > >> >> > > >> > > > hearbeat > > >> >> > > >> > > > > > request. > > >> >> > > >> > > > > > > > >> >> > > >> > > > > > KafkaConsumers i create sometimes get session time > > out > > >> >> due > > >> >> > to > > >> >> > > no > > >> >> > > >> > > > hearbeat > > >> >> > > >> > > > > > specially during longer processing times. I call > > >> >> > > >> > CommitSync(offsets) > > >> >> > > >> > > > > after > > >> >> > > >> > > > > > regular intervals to keep session alive when > > >> processing > > >> >> > takes > > >> >> > > >> > longer > > >> >> > > >> > > > than > > >> >> > > >> > > > > > usual. Every thing works fine if commit intervals > > are > > >> >> very > > >> >> > > >> small or > > >> >> > > >> > > if > > >> >> > > >> > > > i > > >> >> > > >> > > > > > commit after each record but if i commit lets say > > >> every > > >> >> 12 > > >> >> > > >> seconds > > >> >> > > >> > > and > > >> >> > > >> > > > 30 > > >> >> > > >> > > > > > seconds is session time then i can see consumer > > >> getting > > >> >> > timed > > >> >> > > >> out > > >> >> > > >> > > > > > sometimes. > > >> >> > > >> > > > > > > > >> >> > > >> > > > > > Any help or pointers will be much appreciated. > > Thanks > > >> in > > >> >> > > >> advance. > > >> >> > > >> > > > > > > > >> >> > > >> > > > > > Regards, > > >> >> > > >> > > > > > Vinay sharma > > >> >> > > >> > > > > > > > >> >> > > >> > > > > > > >> >> > > >> > > > > > >> >> > > >> > > > > >> >> > > >> > > > >> >> > > >> > > >> >> > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > >> > > > >> > > > > > > > > >