Hi, As already pointed out by David and Dana, Your process is taking time in processing polled records. This long processing time causes your consumers session to time out. To keep session alive consumer must send a heartbeat request with in specified session time out. A heartbeat request is automatically triggered with a poll or commitSync if last hearbeat was not sent in past few seconds as configured through config "heartbeat.interval.ms". You have following option 1) Decrease "max.partition.fetch.bytes" to a limit which gives you less number of records so that your poll processing finishes earlier than session time out time. 2) Increase session time out of consumer through property " session.timeout.ms". Default is 30000 ms. 3) Call commitSync in between your processing to keep committing processed records to kafka time to time (lets say every 10 seconds or so). This will trigger heartbeat request and keep your consumer session alive. I have seen that sometimes heartbeat request is not triggered or answered with commitSync. There are some defects open and fixed in ver 0.10 where commitSync itself will act as heartbeat. So if you take this approach now then make sure to commitSync more than once with in session time range so that there are less chances of missing a heartbeat for whole session time.
If you configure both 1 and 2 even then there is no guarantee than you processing time will not go higher than session time out specified as you may have a dependency on external systems which may respond slow in some rare but possible scenarios. This is why i also implement 3rd approach which also alerts me well in advance when my consumer is marked dead due to some reason. Regards, Vinay Sharma On Mon, May 2, 2016 at 11:53 PM, David Buschman <david.busch...@timeli.io> wrote: > To add to what Dana said, we fixed this issue on AWS with setting the > “max.partition.fetch.bytes” to a smaller setting so out consumer would poll > more frequently. > > Try setting “max.partition.fetch.bytes” to “750000”, then “500000”, then > “250000”, … until the error stop occurring. The default is 1,048,576 > > Thanks, > DaVe. > > > > On May 2, 2016, at 8:48 PM, Dana Powers <dana.pow...@gmail.com> wrote: > > > > It means there was a consumer group rebalance that this consumer missed. > > You may be spending too much time in msg processing between poll() calls. > > > > -Dana > >