I've been unable to reproduce this issue running locally. Even with a poll timeout of 1 millisecond, it seems to work as expected. It would be helpful to know a little more about your setup. Are you using SSL? Are the brokers remote? Is the network stable?
Thanks, Jason On Tue, Dec 1, 2015 at 10:06 AM, Jason Gustafson <ja...@confluent.io> wrote: > Hi Martin, > > I'm also not sure why the poll timeout would affect this. Perhaps the > handler is still doing work (e.g. sending requests) when the record set is > empty? > > As a general rule, I would recommend longer poll timeouts. I've actually > tended to use Long.MAX_VALUE myself. I'll have a look just to make sure > everything still works with smaller values though. > > -Jason > > > > On Tue, Dec 1, 2015 at 2:35 AM, Martin Skøtt < > martin.sko...@falconsocial.com> wrote: > >> Hi Jason, >> >> That actually sounds like a very plausible explanation. My current >> consumer >> is using the default settings, but I have previously used these (taken >> from >> the sample in the Javadoc for the new KafkaConsumer): >> "auto.commit.interval.ms", "1000" >> "session.timeout.ms", "30000" >> >> My consumer loop is quite simple as it just calls a domain specific >> service: >> >> while (true) { >> ConsumerRecords<String, Object> records = consumer.poll(10000); >> for (ConsumerRecord<String, Object> record : records) { >> serve.handle(record.topic(), record.value()); >> } >> } >> >> The domain service does a number of things (including lookups in a RDBMS >> and saving to ElasticSearch). In my local test setup a poll will often >> result between 5.000 and 10.000 records and I can easily see the >> processing >> of those taking more than 30 seconds. >> >> I'll probably take a look at adding some threading to my consumer and add >> more partitions to my topics. >> >> That is all fine, but it doesn't really explain why increasing poll >> timeout >> made the problem go away :-/ >> >> Martin >> >> On 30 November 2015 at 19:30, Jason Gustafson <ja...@confluent.io> wrote: >> >> > Hey Martin, >> > >> > At a glance, it looks like your consumer's session timeout is expiring. >> > This shouldn't happen unless there is a delay between successive calls >> to >> > poll which is longer than the session timeout. It might help if you >> include >> > a snippet of your poll loop and your configuration (i.e. any overridden >> > settings). >> > >> > -Jason >> > >> > On Mon, Nov 30, 2015 at 8:12 AM, Martin Skøtt < >> > martin.sko...@falconsocial.com> wrote: >> > >> > > Well, I made the problem go away, but I'm not sure why it works :-/ >> > > >> > > Previously I used a time out value of 100 for Consumer.poll(). >> Increasing >> > > it to 10.000 makes the problem go away completely?! I tried other >> values >> > as >> > > well: >> > > - 0 problem remained >> > > - 3000, same as heartbeat.interval, problem remained, but less >> > frequent >> > > >> > > Not really sure what is going on, but happy that the problem went away >> > :-) >> > > >> > > Martin >> > > >> > > On 30 November 2015 at 15:33, Martin Skøtt < >> > martin.sko...@falconsocial.com >> > > > >> > > wrote: >> > > >> > > > Hi Guozhang, >> > > > >> > > > I have done some testing with various values of >> heartbeat.interval.ms >> > > and >> > > > they don't seem to have any influence on the error messages. Running >> > > > kafka-consumer-groups also continues to return that the consumer >> groups >> > > > does not exists or is rebalancing. Do you have any suggestions to >> how I >> > > > could debug this further? >> > > > >> > > > Regards, >> > > > Martin >> > > > >> > > > >> > > > On 25 November 2015 at 18:37, Guozhang Wang <wangg...@gmail.com> >> > wrote: >> > > > >> > > >> Hello Martin, >> > > >> >> > > >> It seems your consumer's heartbeat.interval.ms config value is too >> > > small >> > > >> (default is 3 seconds) for your environment, consider increasing it >> > and >> > > >> see >> > > >> if this issue goes away. >> > > >> >> > > >> At the same time, we have some better error handling fixes in trunk >> > > which >> > > >> will be included in the next point release. >> > > >> >> > > >> https://issues.apache.org/jira/browse/KAFKA-2860 >> > > >> >> > > >> Guozhang >> > > >> >> > > >> >> > > >> >> > > >> On Wed, Nov 25, 2015 at 6:54 AM, Martin Skøtt < >> > > >> martin.sko...@falconsocial.com> wrote: >> > > >> >> > > >> > Hi, >> > > >> > >> > > >> > I'm experiencing some very strange issues with 0.9. I get these >> log >> > > >> > messages from the new consumer: >> > > >> > >> > > >> > [main] ERROR >> > > >> > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - >> > > Error >> > > >> > ILLEGAL_GENERATION occurred while committing offsets for group >> > > >> > aaa-bbb-reader >> > > >> > [main] WARN >> > > >> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator >> > > >> > - Auto offset commit failed: Commit cannot be completed due to >> group >> > > >> > rebalance >> > > >> > [main] ERROR >> > > >> > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - >> > > Error >> > > >> > ILLEGAL_GENERATION occurred while committing offsets for group >> > > >> > aaa-bbb-reader >> > > >> > [main] WARN >> > > >> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator >> > > >> > - Auto offset commit failed: >> > > >> > [main] INFO >> > > >> org.apache.kafka.clients.consumer.internals.AbstractCoordinator >> > > >> > - Attempt to join group aaa-bbb-reader failed due to unknown >> member >> > > id, >> > > >> > resetting and retrying. >> > > >> > >> > > >> > And this in the broker log: >> > > >> > [2015-11-25 15:41:01,542] INFO [GroupCoordinator 0]: Preparing to >> > > >> > restabilize group aaa-bbb-reader with old generation 1 >> > > >> > (kafka.coordinator.GroupCoordinator) >> > > >> > [2015-11-25 15:41:01,544] INFO [GroupCoordinator 0]: >> > > >> > Group aaa-bbb-reader generation 1 is dead and removed >> > > >> > (kafka.coordinator.GroupCoordinator) >> > > >> > [2015-11-25 15:41:13,474] INFO [GroupCoordinator 0]: Preparing to >> > > >> > restabilize group aaa-bbb-reader with old generation 0 >> > > >> > (kafka.coordinator.GroupCoordinator) >> > > >> > [2015-11-25 15:41:13,475] INFO [GroupCoordinator 0]: Stabilized >> > > >> > group aaa-bbb-reader generation 1 >> > (kafka.coordinator.GroupCoordinator) >> > > >> > [2015-11-25 15:41:13,477] INFO [GroupCoordinator 0]: Assignment >> > > received >> > > >> > from leader for group aaa-bbb-reader for generation 1 >> > > >> > (kafka.coordinator.GroupCoordinator) >> > > >> > [2015-11-25 15:41:43,478] INFO [GroupCoordinator 0]: Preparing to >> > > >> > restabilize group aaa-bbb-reader with old generation 1 >> > > >> > (kafka.coordinator.GroupCoordinator) >> > > >> > [2015-11-25 15:41:43,478] INFO [GroupCoordinator 0]: >> > > >> > Group aaa-bbb-reader generation 1 is dead and removed >> > > >> > (kafka.coordinator.GroupCoordinator) >> > > >> > >> > > >> > When this happens the kafka-consumer-groups describe command >> keeps >> > > >> saying >> > > >> > that the group no longer exists or is rebalancing. What is >> probably >> > > even >> > > >> > worse is that my consumers appears to be looping constantly >> through >> > > >> > everything written to the topics!? >> > > >> > >> > > >> > Does anyone have any input on what might be happening? >> > > >> > >> > > >> > I'm running 0.9 locally on my laptop using one Zookeeper and one >> > > broker, >> > > >> > both using the configuration provided in the distribution. I >> have 13 >> > > >> topics >> > > >> > with two partitions each and a replication factor of 1. I run one >> > > >> producer >> > > >> > and once consumer also on the same machine. >> > > >> > >> > > >> > -- >> > > >> > Martin Skøtt >> > > >> > >> > > >> >> > > >> >> > > >> >> > > >> -- >> > > >> -- Guozhang >> > > >> >> > > > >> > > > >> > > >> > >> > >