We're using kafka 0.8.1.1 -----Original Message----- From: Jun Rao [mailto:jun...@gmail.com] Sent: Monday, June 30, 2014 10:23 AM To: users@kafka.apache.org Subject: Re: Failed to send messages after 3 tries
Which version of Kafka are you using? Thanks, Jun On Fri, Jun 27, 2014 at 11:57 AM, England, Michael <mengl...@homeadvisor.com > wrote: > Neha, > > In state-change.log I see lots of logging from when I last started up > kafka, and nothing after that. I do see a bunch of errors of the form: > [2014-06-25 13:21:37,124] ERROR Controller 1 epoch 11 initiated state > change for partition [lead.indexer,37] from OfflinePartition to > OnlinePartition failed (state.change.logger) > kafka.common.NoReplicaOnlineException: No replica for partition > [lead.indexer,37] is alive. Live brokers are: [Set()], Assigned replicas > are: [List(1)] > at > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61) > at > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336) > at > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185) > at > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99) > at > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) > at scala.collection.Iterator$class.foreach(Iterator.scala:772) > at > scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:95) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) > at > kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96) > at > kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68) > at > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312) > at > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162) > at > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63) > at > kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:49) > at > kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) > at > kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) > at kafka.utils.Utils$.inLock(Utils.scala:538) > at > kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:47) > at > kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:637) > at > kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) > at > kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) > at kafka.utils.Utils$.inLock(Utils.scala:538) > at > kafka.controller.KafkaController.startup(KafkaController.scala:633) > at kafka.server.KafkaServer.startup(KafkaServer.scala:96) > at > kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) > at kafka.Kafka$.main(Kafka.scala:46) > at kafka.Kafka.main(Kafka.scala) > > And also errors of the form: > [2014-06-25 13:21:42,502] ERROR Broker 1 aborted the become-follower state > change with correlation id 4 from controller 1 epoch 10 for partition > [lead.indexer,11] new leader -1 (state.change.logger) > > Are either of these of concern? > > In controller.log there I also see logging from start-up, and then > nothing. There are no errors, but I do see some warnings. They seem rather > benign. Here's a sample: > [2014-06-25 13:21:47,678] WARN [OfflinePartitionLeaderSelector]: No broker > in ISR is alive for [lead.indexer,45]. Elect leader 1 from live brokers 1. > There's potential data loss. > (kafka.controller.OfflinePartitionLeaderSelector) > [2014-06-25 13:21:47,678] INFO [OfflinePartitionLeaderSelector]: Selected > new leader and ISR {"leader":1,"leader_epoch":3,"isr":[1]} for offline > partition [lead.indexer,45] > (kafka.controller.OfflinePartitionLeaderSelector) > > In kafka.out I see this error message: > [2014-06-27 11:50:01,366] ERROR Closing socket for /10.1.162.67 because > of error (kafka.network.Processor) > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) > at sun.nio.ch.IOUtil.write(IOUtil.java:60) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:217) > at kafka.network.Processor.write(SocketServer.scala:375) > at kafka.network.Processor.run(SocketServer.scala:247) > at java.lang.Thread.run(Thread.java:722) > > My understanding is that this is fine, and simply correlates with me > shutting down a producer or consumer. > > Kafka-request.log is empty. > > In server.log there are just a few lines that look like this: > [2014-06-27 12:04:10,620] INFO Closing socket connection to /10.1.162.67. > (kafka.network.Processor) > [2014-06-27 12:04:11,681] INFO Closing socket connection to /10.1.162.67. > (kafka.network.Processor) > [2014-06-27 12:12:40,561] INFO Closing socket connection to /10.3.230.131. > (kafka.network.Processor) > [2014-06-27 12:12:40,776] INFO Closing socket connection to /10.3.230.126. > (kafka.network.Processor) > [2014-06-27 12:12:40,776] INFO Closing socket connection to /10.3.230.126. > (kafka.network.Processor) > [2014-06-27 12:12:40,803] INFO Closing socket connection to /10.3.230.126. > (kafka.network.Processor) > [2014-06-27 12:12:40,804] INFO Closing socket connection to /10.3.230.126. > (kafka.network.Processor) > [2014-06-27 12:12:44,900] INFO Closing socket connection to /10.3.230.131. > (kafka.network.Processor) > [2014-06-27 12:17:44,242] INFO Closing socket connection to /10.1.162.114. > (kafka.network.Processor) > > If you'd like to see more log output, please let me know the best way to > send you the complete files. Some of the logs are large, and I'm reluctant > to send them to the mailing list as attachments. > > Thanks, > > Mike > > -----Original Message----- > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > Sent: Friday, June 27, 2014 11:30 AM > To: users@kafka.apache.org > Subject: Re: Failed to send messages after 3 tries > > I'm not so sure what is causing those exceptions. When you send data, do > you see any errors in the server logs? Could you send it around? > > > On Fri, Jun 27, 2014 at 10:00 AM, England, Michael < > mengl...@homeadvisor.com > > wrote: > > > Neha, > > > > Apologies for the slow response. I was out yesterday. > > > > To answer your questions.... > > -- Is the LeaderNotAvailableException repeatable? Yes. I happens whenever > > I send a message to that topic. > > -- Are you running Kafka in the cloud? No. > > > > Does this problem indicate that the topic is corrupt? If so, what would I > > need to do to clean it up? > > > > Thanks, > > > > Mike > > > > > > > > -----Original Message----- > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > Sent: Wednesday, June 25, 2014 11:24 PM > > To: users@kafka.apache.org > > Subject: Re: Failed to send messages after 3 tries > > > > The output from the list topic tool suggests that a leader is available > for > > all partitions. Is the LeaderNotAvailableException repeatable? Are you > > running Kafka in the cloud? > > > > > > On Wed, Jun 25, 2014 at 4:03 PM, England, Michael < > > mengl...@homeadvisor.com> > > wrote: > > > > > By the way, this is what I get when I describe the topic: > > > > > > Topic:lead.indexer PartitionCount:53 ReplicationFactor:1 > > Configs: > > > Topic: lead.indexer Partition: 0 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 1 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 2 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 3 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 4 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 5 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 6 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 7 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 8 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 9 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 10 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 11 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 12 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 13 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 14 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 15 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 16 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 17 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 18 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 19 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 20 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 21 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 22 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 23 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 24 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 25 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 26 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 27 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 28 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 29 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 30 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 31 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 32 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 33 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 34 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 35 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 36 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 37 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 38 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 39 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 40 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 41 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 42 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 43 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 44 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 45 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 46 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 47 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 48 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 49 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 50 Leader: 2 Replicas: 2 > > > Isr: 2 > > > Topic: lead.indexer Partition: 51 Leader: 1 Replicas: 1 > > > Isr: 1 > > > Topic: lead.indexer Partition: 52 Leader: 2 Replicas: 2 > > > Isr: 2 > > > > > > -----Original Message----- > > > From: England, Michael > > > Sent: Wednesday, June 25, 2014 4:58 PM > > > To: users@kafka.apache.org > > > Subject: RE: Failed to send messages after 3 tries > > > > > > Ok, at WARN level I see the following: > > > > > > 2014-06-25 16:46:16 WARN kafka-consumer-sp_lead.index.processor1 > > > kafka.producer.BrokerPartitionInfo - Error while fetching metadata > > > [{TopicMetadata for topic lead.indexer -> > > > No partition metadata for topic lead.indexer due to > > > kafka.common.LeaderNotAvailableException}] for topic [lead.indexer]: > > class > > > kafka.common.LeaderNotAvailableException > > > > > > Any suggestions about how to address this? I see that there are some > > > threads about this in the mailing list archive. I'll start to look > > through > > > them. > > > > > > Thanks, > > > > > > Mike > > > > > > -----Original Message----- > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > > Sent: Wednesday, June 25, 2014 4:47 PM > > > To: users@kafka.apache.org > > > Subject: Re: Failed to send messages after 3 tries > > > > > > It should be at WARN. > > > > > > > > > On Wed, Jun 25, 2014 at 3:42 PM, England, Michael < > > > mengl...@homeadvisor.com> > > > wrote: > > > > > > > Neha, > > > > > > > > I don’t see that error message in the logs. The error that I included > > in > > > > my original email is the only error that I see from Kafka. > > > > > > > > Do I need to change log levels get the info that you need? > > > > > > > > Mike > > > > > > > > -----Original Message----- > > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > > > Sent: Wednesday, June 25, 2014 4:31 PM > > > > To: users@kafka.apache.org > > > > Subject: Re: Failed to send messages after 3 tries > > > > > > > > Could you provide information on why each retry failed. Look for an > > error > > > > message that says "Failed to send producer request". > > > > > > > > > > > > On Wed, Jun 25, 2014 at 2:18 PM, England, Michael < > > > > mengl...@homeadvisor.com> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I get the following error from my producer when sending a message: > > > > > Caused by: kafka.common.FailedToSendMessageException: Failed to > send > > > > > messages after 3 tries. > > > > > at > > > > > > > > > > > > > > > kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90) > > > > > at kafka.producer.Producer.send(Producer.scala:76) > > > > > at > > > > kafka.javaapi.producer.Producer.send(Producer.scala:42) > > > > > at > > > > > > > > > > > > > > > com.servicemagic.kafka.producer.KafkaProducerTemplate.send(KafkaProducerTemplate.java:37) > > > > > ... 31 more > > > > > > > > > > The producer is running locally, the broker is on a different > > machine. > > > I > > > > > can telnet to the broker, so it isn't a network issue. Also, I have > > > other > > > > > producers that work fine using the same broker (but a different > > topic). > > > > > > > > > > I've checked the various logs on the broker, but I don't see > anything > > > > > obvious in them. I'm not sure how to turn up the logging level, > > though, > > > > so > > > > > perhaps there would be useful info if I could do that. > > > > > > > > > > Can you give me some suggestions on how to troubleshoot this issue? > > > > > > > > > > Thanks, > > > > > > > > > > Mike > > > > > > > > > > > > > > >