Alex Shafer created KAFKA-1643:
----------------------------------
Summary: message.send.max.retries not respected when no brokers
are up
Key: KAFKA-1643
URL: https://issues.apache.org/jira/browse/KAFKA-1643
Project: Kafka
Issue Type: Bug
Components: producer
Affects Versions: 0.8.1.1
Reporter: Alex Shafer
Assignee: Jun Rao
{noformat}
2014-09-19 20:20:04,320 WARN kafka.producer.async.DefaultEventHandler: Failed
to send producer request with correlation id 1944531 to broker 6405 with data
for partitions [lva1-spades-hdfs-audit,2]
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:148)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:499)
at java.nio.channels.SocketChannel.write(SocketChannel.java:502)
at
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:92)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72)
at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.producer.SyncProducer.send(SyncProducer.scala:100)
at
kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:255)
at
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:106)
at
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:100)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at
scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at
kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:100)
at
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:72)
at kafka.producer.Producer.send(Producer.scala:76)
at kafka.producer.KafkaLog4jAppender.append(KafkaLog4jAppender.scala:93)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at
org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176)
........
2014-09-19 20:20:04,331 INFO kafka.producer.async.DefaultEventHandler: Back off
for 100 ms before retrying send. Remaining retries = 3
2014-09-19 20:20:04,433 INFO kafka.client.ClientUtils$: Fetching metadata from
broker id:17,host:eat1-hcl6393.grid.linkedin.com,port:9092 with correlation id
1944532 for 1 topic(s) Set(lva1-spades-hdfs-audit)
2014-09-19 20:20:04,451 ERROR kafka.producer.SyncProducer: Producer connection
to eat1-hcl6393.grid.linkedin.com:9092 unsuccessful
java.net.ConnectException: Connection refused
.....
{noformat}
It would seem that IOExceptions during a send notice message.send.max.retries,
but subsequent ConnectExceptions do not. When exceptions are being logged to a
local file, this can cause the service to lock up with the extreme amounts of
writes that are taking place.
Additionally, there should be another configuration property to completely
disable a producer when a given number of messages have failed to send.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)