Hi,

We are using zookeeper 3.3.6 with kafka 0.7.2. We have a topic with 8 
partitions on each of 3 brokers that we are consuming with a consumer group 
with multiple threads.  We are using the following settings for our consumers:
zk.connectiontimeout.ms=12000000
fetch_size=52428800
queuedchunks.max=6
consumer.timeout.ms=5000

Our brokers have the following configuration:
socket.send.buffer=1048576
socket.receive.buffer=1048576
max.socket.request.bytes=104857600
log.flush.interval=10000
log.default.flush.interval.ms=1000
log.default.flush.scheduler.interval.ms=1000
log.retention.hours=4
log.file.size=536870912
enable.zookeeper=true
zk.connectiontimeout.ms=6000
zk.sessiontimeout.ms=6000
max.message.size=52428800

We are noticing that after the consumer runs for a short while, some threads 
stop consuming and start throwing the following timeout exceptions:
kafka.consumer.ConsumerTimeoutException
        at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:66)
        at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:32)
        at 
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
        at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)

When this happens, message consumption on the affected partitions doesn't 
recover but stalls and the consumer offset remains frozen.  The exceptions also 
continue to be thrown in the logs as the thread logic logs the error then tries 
to create another iterator from the stream and consume from it.  We also notice 
that consumption tends to freeze on 2/3 brokers but there is one that always 
seems to keep the consumers fed.  Are there settings or logic we can use to 
avoid or recover from such exceptions? 

-drew

Reply via email to