Hi, We are using zookeeper 3.3.6 with kafka 0.7.2. We have a topic with 8 partitions on each of 3 brokers that we are consuming with a consumer group with multiple threads. We are using the following settings for our consumers: zk.connectiontimeout.ms=12000000 fetch_size=52428800 queuedchunks.max=6 consumer.timeout.ms=5000
Our brokers have the following configuration: socket.send.buffer=1048576 socket.receive.buffer=1048576 max.socket.request.bytes=104857600 log.flush.interval=10000 log.default.flush.interval.ms=1000 log.default.flush.scheduler.interval.ms=1000 log.retention.hours=4 log.file.size=536870912 enable.zookeeper=true zk.connectiontimeout.ms=6000 zk.sessiontimeout.ms=6000 max.message.size=52428800 We are noticing that after the consumer runs for a short while, some threads stop consuming and start throwing the following timeout exceptions: kafka.consumer.ConsumerTimeoutException at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:66) at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:32) at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59) at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51) When this happens, message consumption on the affected partitions doesn't recover but stalls and the consumer offset remains frozen. The exceptions also continue to be thrown in the logs as the thread logic logs the error then tries to create another iterator from the stream and consume from it. We also notice that consumption tends to freeze on 2/3 brokers but there is one that always seems to keep the consumers fed. Are there settings or logic we can use to avoid or recover from such exceptions? -drew