There are a couple of things you can try. 1. See if broker 1 and 2 are indeed registered in ZK (see Broker registration info in https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper ).
2. Does restarting broker 1 and 2 solve the issue? Thanks, Jun On Tue, Jul 1, 2014 at 2:56 PM, England, Michael <mengl...@homeadvisor.com> wrote: > We're using kafka 0.8.1.1 > > -----Original Message----- > From: Jun Rao [mailto:jun...@gmail.com] > Sent: Monday, June 30, 2014 10:23 AM > To: users@kafka.apache.org > Subject: Re: Failed to send messages after 3 tries > > Which version of Kafka are you using? > > Thanks, > > Jun > > > On Fri, Jun 27, 2014 at 11:57 AM, England, Michael < > mengl...@homeadvisor.com > > wrote: > > > Neha, > > > > In state-change.log I see lots of logging from when I last started up > > kafka, and nothing after that. I do see a bunch of errors of the form: > > [2014-06-25 13:21:37,124] ERROR Controller 1 epoch 11 initiated state > > change for partition [lead.indexer,37] from OfflinePartition to > > OnlinePartition failed (state.change.logger) > > kafka.common.NoReplicaOnlineException: No replica for partition > > [lead.indexer,37] is alive. Live brokers are: [Set()], Assigned replicas > > are: [List(1)] > > at > > > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61) > > at > > > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336) > > at > > > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185) > > at > > > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99) > > at > > > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96) > > at > > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) > > at > > > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) > > at > > > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) > > at scala.collection.Iterator$class.foreach(Iterator.scala:772) > > at > > scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157) > > at > > > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) > > at > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45) > > at scala.collection.mutable.HashMap.foreach(HashMap.scala:95) > > at > > > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) > > at > > > kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96) > > at > > > kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68) > > at > > > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312) > > at > > > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162) > > at > > > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63) > > at > > > kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:49) > > at > > > kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) > > at > > > kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) > > at kafka.utils.Utils$.inLock(Utils.scala:538) > > at > > > kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:47) > > at > > > kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:637) > > at > > > kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) > > at > > > kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) > > at kafka.utils.Utils$.inLock(Utils.scala:538) > > at > > kafka.controller.KafkaController.startup(KafkaController.scala:633) > > at kafka.server.KafkaServer.startup(KafkaServer.scala:96) > > at > > kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) > > at kafka.Kafka$.main(Kafka.scala:46) > > at kafka.Kafka.main(Kafka.scala) > > > > And also errors of the form: > > [2014-06-25 13:21:42,502] ERROR Broker 1 aborted the become-follower > state > > change with correlation id 4 from controller 1 epoch 10 for partition > > [lead.indexer,11] new leader -1 (state.change.logger) > > > > Are either of these of concern? > > > > In controller.log there I also see logging from start-up, and then > > nothing. There are no errors, but I do see some warnings. They seem > rather > > benign. Here's a sample: > > [2014-06-25 13:21:47,678] WARN [OfflinePartitionLeaderSelector]: No > broker > > in ISR is alive for [lead.indexer,45]. Elect leader 1 from live brokers > 1. > > There's potential data loss. > > (kafka.controller.OfflinePartitionLeaderSelector) > > [2014-06-25 13:21:47,678] INFO [OfflinePartitionLeaderSelector]: Selected > > new leader and ISR {"leader":1,"leader_epoch":3,"isr":[1]} for offline > > partition [lead.indexer,45] > > (kafka.controller.OfflinePartitionLeaderSelector) > > > > In kafka.out I see this error message: > > [2014-06-27 11:50:01,366] ERROR Closing socket for /10.1.162.67 because > > of error (kafka.network.Processor) > > java.io.IOException: Connection reset by peer > > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) > > at sun.nio.ch.IOUtil.write(IOUtil.java:60) > > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > > at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:217) > > at kafka.network.Processor.write(SocketServer.scala:375) > > at kafka.network.Processor.run(SocketServer.scala:247) > > at java.lang.Thread.run(Thread.java:722) > > > > My understanding is that this is fine, and simply correlates with me > > shutting down a producer or consumer. > > > > Kafka-request.log is empty. > > > > In server.log there are just a few lines that look like this: > > [2014-06-27 12:04:10,620] INFO Closing socket connection to /10.1.162.67 > . > > (kafka.network.Processor) > > [2014-06-27 12:04:11,681] INFO Closing socket connection to /10.1.162.67 > . > > (kafka.network.Processor) > > [2014-06-27 12:12:40,561] INFO Closing socket connection to / > 10.3.230.131. > > (kafka.network.Processor) > > [2014-06-27 12:12:40,776] INFO Closing socket connection to / > 10.3.230.126. > > (kafka.network.Processor) > > [2014-06-27 12:12:40,776] INFO Closing socket connection to / > 10.3.230.126. > > (kafka.network.Processor) > > [2014-06-27 12:12:40,803] INFO Closing socket connection to / > 10.3.230.126. > > (kafka.network.Processor) > > [2014-06-27 12:12:40,804] INFO Closing socket connection to / > 10.3.230.126. > > (kafka.network.Processor) > > [2014-06-27 12:12:44,900] INFO Closing socket connection to / > 10.3.230.131. > > (kafka.network.Processor) > > [2014-06-27 12:17:44,242] INFO Closing socket connection to / > 10.1.162.114. > > (kafka.network.Processor) > > > > If you'd like to see more log output, please let me know the best way to > > send you the complete files. Some of the logs are large, and I'm > reluctant > > to send them to the mailing list as attachments. > > > > Thanks, > > > > Mike > > > > -----Original Message----- > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > Sent: Friday, June 27, 2014 11:30 AM > > To: users@kafka.apache.org > > Subject: Re: Failed to send messages after 3 tries > > > > I'm not so sure what is causing those exceptions. When you send data, do > > you see any errors in the server logs? Could you send it around? > > > > > > On Fri, Jun 27, 2014 at 10:00 AM, England, Michael < > > mengl...@homeadvisor.com > > > wrote: > > > > > Neha, > > > > > > Apologies for the slow response. I was out yesterday. > > > > > > To answer your questions.... > > > -- Is the LeaderNotAvailableException repeatable? Yes. I happens > whenever > > > I send a message to that topic. > > > -- Are you running Kafka in the cloud? No. > > > > > > Does this problem indicate that the topic is corrupt? If so, what > would I > > > need to do to clean it up? > > > > > > Thanks, > > > > > > Mike > > > > > > > > > > > > -----Original Message----- > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > > Sent: Wednesday, June 25, 2014 11:24 PM > > > To: users@kafka.apache.org > > > Subject: Re: Failed to send messages after 3 tries > > > > > > The output from the list topic tool suggests that a leader is available > > for > > > all partitions. Is the LeaderNotAvailableException repeatable? Are you > > > running Kafka in the cloud? > > > > > > > > > On Wed, Jun 25, 2014 at 4:03 PM, England, Michael < > > > mengl...@homeadvisor.com> > > > wrote: > > > > > > > By the way, this is what I get when I describe the topic: > > > > > > > > Topic:lead.indexer PartitionCount:53 ReplicationFactor:1 > > > Configs: > > > > Topic: lead.indexer Partition: 0 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 1 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 2 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 3 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 4 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 5 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 6 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 7 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 8 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 9 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 10 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 11 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 12 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 13 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 14 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 15 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 16 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 17 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 18 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 19 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 20 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 21 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 22 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 23 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 24 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 25 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 26 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 27 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 28 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 29 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 30 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 31 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 32 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 33 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 34 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 35 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 36 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 37 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 38 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 39 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 40 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 41 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 42 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 43 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 44 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 45 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 46 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 47 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 48 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 49 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 50 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > Topic: lead.indexer Partition: 51 Leader: 1 Replicas: > 1 > > > > Isr: 1 > > > > Topic: lead.indexer Partition: 52 Leader: 2 Replicas: > 2 > > > > Isr: 2 > > > > > > > > -----Original Message----- > > > > From: England, Michael > > > > Sent: Wednesday, June 25, 2014 4:58 PM > > > > To: users@kafka.apache.org > > > > Subject: RE: Failed to send messages after 3 tries > > > > > > > > Ok, at WARN level I see the following: > > > > > > > > 2014-06-25 16:46:16 WARN kafka-consumer-sp_lead.index.processor1 > > > > kafka.producer.BrokerPartitionInfo - Error while fetching metadata > > > > [{TopicMetadata for topic lead.indexer -> > > > > No partition metadata for topic lead.indexer due to > > > > kafka.common.LeaderNotAvailableException}] for topic [lead.indexer]: > > > class > > > > kafka.common.LeaderNotAvailableException > > > > > > > > Any suggestions about how to address this? I see that there are some > > > > threads about this in the mailing list archive. I'll start to look > > > through > > > > them. > > > > > > > > Thanks, > > > > > > > > Mike > > > > > > > > -----Original Message----- > > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > > > Sent: Wednesday, June 25, 2014 4:47 PM > > > > To: users@kafka.apache.org > > > > Subject: Re: Failed to send messages after 3 tries > > > > > > > > It should be at WARN. > > > > > > > > > > > > On Wed, Jun 25, 2014 at 3:42 PM, England, Michael < > > > > mengl...@homeadvisor.com> > > > > wrote: > > > > > > > > > Neha, > > > > > > > > > > I don’t see that error message in the logs. The error that I > included > > > in > > > > > my original email is the only error that I see from Kafka. > > > > > > > > > > Do I need to change log levels get the info that you need? > > > > > > > > > > Mike > > > > > > > > > > -----Original Message----- > > > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > > > > Sent: Wednesday, June 25, 2014 4:31 PM > > > > > To: users@kafka.apache.org > > > > > Subject: Re: Failed to send messages after 3 tries > > > > > > > > > > Could you provide information on why each retry failed. Look for an > > > error > > > > > message that says "Failed to send producer request". > > > > > > > > > > > > > > > On Wed, Jun 25, 2014 at 2:18 PM, England, Michael < > > > > > mengl...@homeadvisor.com> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I get the following error from my producer when sending a > message: > > > > > > Caused by: kafka.common.FailedToSendMessageException: Failed to > > send > > > > > > messages after 3 tries. > > > > > > at > > > > > > > > > > > > > > > > > > > > > kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90) > > > > > > at > kafka.producer.Producer.send(Producer.scala:76) > > > > > > at > > > > > kafka.javaapi.producer.Producer.send(Producer.scala:42) > > > > > > at > > > > > > > > > > > > > > > > > > > > > com.servicemagic.kafka.producer.KafkaProducerTemplate.send(KafkaProducerTemplate.java:37) > > > > > > ... 31 more > > > > > > > > > > > > The producer is running locally, the broker is on a different > > > machine. > > > > I > > > > > > can telnet to the broker, so it isn't a network issue. Also, I > have > > > > other > > > > > > producers that work fine using the same broker (but a different > > > topic). > > > > > > > > > > > > I've checked the various logs on the broker, but I don't see > > anything > > > > > > obvious in them. I'm not sure how to turn up the logging level, > > > though, > > > > > so > > > > > > perhaps there would be useful info if I could do that. > > > > > > > > > > > > Can you give me some suggestions on how to troubleshoot this > issue? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Mike > > > > > > > > > > > > > > > > > > > > >