In that case, have you looked at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped%2Cwhy%3F?
Thanks, Jun On Wed, Aug 14, 2013 at 7:38 AM, Drew Daugherty < drew.daughe...@returnpath.com> wrote: > The problem is not the fact that the timeout exceptions are being thrown. > We have tried with and without the timeout setting and, in both cases, we > end up with threads that are stalled and not consuming data. Thus the > problem is consumers that are registered and not consuming and no > rebalancing is done We suspected a problem with zookeeper but we have run > smoke and latency tests and got reasonable results. > > -drew > > Sent from Moxier Mail > (http://www.moxier.com) > > > ----- Original Message ----- > From: Jun Rao <jun...@gmail.com> > To: "users@kafka.apache.org" <users@kafka.apache.org> > Sent: 8/13/2013 10:17 PM > Subject: Re: Kafka Consumer Threads Stalled > > > > If you don't want to see ConsumerTimeoutException, just set > consumer.timeout.ms to -1. If you do need consumer.timeout.ms larger than > 0, make sure that on ConsumerTimeoutException, your consumer thread loops > back and calls hasNext() on the iterator to resume the consumption. > > Thanks, > > Jun > > > On Tue, Aug 13, 2013 at 4:57 PM, Drew Daugherty < > drew.daughe...@returnpath.com> wrote: > > > Hi, > > > > We are using zookeeper 3.3.6 with kafka 0.7.2. We have a topic with 8 > > partitions on each of 3 brokers that we are consuming with a consumer > group > > with multiple threads. We are using the following settings for our > > consumers: > > zk.connectiontimeout.ms=12000000 > > fetch_size=52428800 > > queuedchunks.max=6 > > consumer.timeout.ms=5000 > > > > Our brokers have the following configuration: > > socket.send.buffer=1048576 > > socket.receive.buffer=1048576 > > max.socket.request.bytes=104857600 > > log.flush.interval=10000 > > log.default.flush.interval.ms=1000 > > log.default.flush.scheduler.interval.ms=1000 > > log.retention.hours=4 > > log.file.size=536870912 > > enable.zookeeper=true > > zk.connectiontimeout.ms=6000 > > zk.sessiontimeout.ms=6000 > > max.message.size=52428800 > > > > We are noticing that after the consumer runs for a short while, some > > threads stop consuming and start throwing the following timeout > exceptions: > > kafka.consumer.ConsumerTimeoutException > > at > > kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:66) > > at > > kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:32) > > at > > kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59) > > at > kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51) > > > > When this happens, message consumption on the affected partitions doesn't > > recover but stalls and the consumer offset remains frozen. The > exceptions > > also continue to be thrown in the logs as the thread logic logs the error > > then tries to create another iterator from the stream and consume from > it. > > We also notice that consumption tends to freeze on 2/3 brokers but there > > is one that always seems to keep the consumers fed. Are there settings > or > > logic we can use to avoid or recover from such exceptions? > > > > -drew > > >