Alex Shafer created KAFKA-1643: ---------------------------------- Summary: message.send.max.retries not respected when no brokers are up Key: KAFKA-1643 URL: https://issues.apache.org/jira/browse/KAFKA-1643 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.1.1 Reporter: Alex Shafer Assignee: Jun Rao
{noformat} 2014-09-19 20:20:04,320 WARN kafka.producer.async.DefaultEventHandler: Failed to send producer request with correlation id 1944531 to broker 6405 with data for partitions [lva1-spades-hdfs-audit,2] java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.write(IOUtil.java:148) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:499) at java.nio.channels.SocketChannel.write(SocketChannel.java:502) at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) at kafka.network.Send$class.writeCompletely(Transmission.scala:75) at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) at kafka.network.BlockingChannel.send(BlockingChannel.scala:92) at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71) at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102) at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101) at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.producer.SyncProducer.send(SyncProducer.scala:100) at kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:255) at kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:106) at kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:100) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:80) at kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:100) at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:72) at kafka.producer.Producer.send(Producer.scala:76) at kafka.producer.KafkaLog4jAppender.append(KafkaLog4jAppender.scala:93) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176) ........ 2014-09-19 20:20:04,331 INFO kafka.producer.async.DefaultEventHandler: Back off for 100 ms before retrying send. Remaining retries = 3 2014-09-19 20:20:04,433 INFO kafka.client.ClientUtils$: Fetching metadata from broker id:17,host:eat1-hcl6393.grid.linkedin.com,port:9092 with correlation id 1944532 for 1 topic(s) Set(lva1-spades-hdfs-audit) 2014-09-19 20:20:04,451 ERROR kafka.producer.SyncProducer: Producer connection to eat1-hcl6393.grid.linkedin.com:9092 unsuccessful java.net.ConnectException: Connection refused ..... {noformat} It would seem that IOExceptions during a send notice message.send.max.retries, but subsequent ConnectExceptions do not. When exceptions are being logged to a local file, this can cause the service to lock up with the extreme amounts of writes that are taking place. Additionally, there should be another configuration property to completely disable a producer when a given number of messages have failed to send. -- This message was sent by Atlassian JIRA (v6.3.4#6332)