Neha, In state-change.log I see lots of logging from when I last started up kafka, and nothing after that. I do see a bunch of errors of the form: [2014-06-25 13:21:37,124] ERROR Controller 1 epoch 11 initiated state change for partition [lead.indexer,37] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [lead.indexer,37] is alive. Live brokers are: [Set()], Assigned replicas are: [List(1)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45) at scala.collection.mutable.HashMap.foreach(HashMap.scala:95) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96) at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63) at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:49) at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) at kafka.utils.Utils$.inLock(Utils.scala:538) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:47) at kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:637) at kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) at kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) at kafka.utils.Utils$.inLock(Utils.scala:538) at kafka.controller.KafkaController.startup(KafkaController.scala:633) at kafka.server.KafkaServer.startup(KafkaServer.scala:96) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala)
And also errors of the form: [2014-06-25 13:21:42,502] ERROR Broker 1 aborted the become-follower state change with correlation id 4 from controller 1 epoch 10 for partition [lead.indexer,11] new leader -1 (state.change.logger) Are either of these of concern? In controller.log there I also see logging from start-up, and then nothing. There are no errors, but I do see some warnings. They seem rather benign. Here's a sample: [2014-06-25 13:21:47,678] WARN [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [lead.indexer,45]. Elect leader 1 from live brokers 1. There's potential data loss. (kafka.controller.OfflinePartitionLeaderSelector) [2014-06-25 13:21:47,678] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":1,"leader_epoch":3,"isr":[1]} for offline partition [lead.indexer,45] (kafka.controller.OfflinePartitionLeaderSelector) In kafka.out I see this error message: [2014-06-27 11:50:01,366] ERROR Closing socket for /10.1.162.67 because of error (kafka.network.Processor) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) at sun.nio.ch.IOUtil.write(IOUtil.java:60) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:217) at kafka.network.Processor.write(SocketServer.scala:375) at kafka.network.Processor.run(SocketServer.scala:247) at java.lang.Thread.run(Thread.java:722) My understanding is that this is fine, and simply correlates with me shutting down a producer or consumer. Kafka-request.log is empty. In server.log there are just a few lines that look like this: [2014-06-27 12:04:10,620] INFO Closing socket connection to /10.1.162.67. (kafka.network.Processor) [2014-06-27 12:04:11,681] INFO Closing socket connection to /10.1.162.67. (kafka.network.Processor) [2014-06-27 12:12:40,561] INFO Closing socket connection to /10.3.230.131. (kafka.network.Processor) [2014-06-27 12:12:40,776] INFO Closing socket connection to /10.3.230.126. (kafka.network.Processor) [2014-06-27 12:12:40,776] INFO Closing socket connection to /10.3.230.126. (kafka.network.Processor) [2014-06-27 12:12:40,803] INFO Closing socket connection to /10.3.230.126. (kafka.network.Processor) [2014-06-27 12:12:40,804] INFO Closing socket connection to /10.3.230.126. (kafka.network.Processor) [2014-06-27 12:12:44,900] INFO Closing socket connection to /10.3.230.131. (kafka.network.Processor) [2014-06-27 12:17:44,242] INFO Closing socket connection to /10.1.162.114. (kafka.network.Processor) If you'd like to see more log output, please let me know the best way to send you the complete files. Some of the logs are large, and I'm reluctant to send them to the mailing list as attachments. Thanks, Mike -----Original Message----- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, June 27, 2014 11:30 AM To: users@kafka.apache.org Subject: Re: Failed to send messages after 3 tries I'm not so sure what is causing those exceptions. When you send data, do you see any errors in the server logs? Could you send it around? On Fri, Jun 27, 2014 at 10:00 AM, England, Michael <mengl...@homeadvisor.com > wrote: > Neha, > > Apologies for the slow response. I was out yesterday. > > To answer your questions.... > -- Is the LeaderNotAvailableException repeatable? Yes. I happens whenever > I send a message to that topic. > -- Are you running Kafka in the cloud? No. > > Does this problem indicate that the topic is corrupt? If so, what would I > need to do to clean it up? > > Thanks, > > Mike > > > > -----Original Message----- > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > Sent: Wednesday, June 25, 2014 11:24 PM > To: users@kafka.apache.org > Subject: Re: Failed to send messages after 3 tries > > The output from the list topic tool suggests that a leader is available for > all partitions. Is the LeaderNotAvailableException repeatable? Are you > running Kafka in the cloud? > > > On Wed, Jun 25, 2014 at 4:03 PM, England, Michael < > mengl...@homeadvisor.com> > wrote: > > > By the way, this is what I get when I describe the topic: > > > > Topic:lead.indexer PartitionCount:53 ReplicationFactor:1 > Configs: > > Topic: lead.indexer Partition: 0 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 1 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 2 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 3 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 4 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 5 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 6 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 7 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 8 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 9 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 10 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 11 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 12 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 13 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 14 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 15 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 16 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 17 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 18 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 19 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 20 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 21 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 22 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 23 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 24 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 25 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 26 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 27 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 28 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 29 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 30 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 31 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 32 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 33 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 34 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 35 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 36 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 37 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 38 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 39 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 40 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 41 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 42 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 43 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 44 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 45 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 46 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 47 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 48 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 49 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 50 Leader: 2 Replicas: 2 > > Isr: 2 > > Topic: lead.indexer Partition: 51 Leader: 1 Replicas: 1 > > Isr: 1 > > Topic: lead.indexer Partition: 52 Leader: 2 Replicas: 2 > > Isr: 2 > > > > -----Original Message----- > > From: England, Michael > > Sent: Wednesday, June 25, 2014 4:58 PM > > To: users@kafka.apache.org > > Subject: RE: Failed to send messages after 3 tries > > > > Ok, at WARN level I see the following: > > > > 2014-06-25 16:46:16 WARN kafka-consumer-sp_lead.index.processor1 > > kafka.producer.BrokerPartitionInfo - Error while fetching metadata > > [{TopicMetadata for topic lead.indexer -> > > No partition metadata for topic lead.indexer due to > > kafka.common.LeaderNotAvailableException}] for topic [lead.indexer]: > class > > kafka.common.LeaderNotAvailableException > > > > Any suggestions about how to address this? I see that there are some > > threads about this in the mailing list archive. I'll start to look > through > > them. > > > > Thanks, > > > > Mike > > > > -----Original Message----- > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > Sent: Wednesday, June 25, 2014 4:47 PM > > To: users@kafka.apache.org > > Subject: Re: Failed to send messages after 3 tries > > > > It should be at WARN. > > > > > > On Wed, Jun 25, 2014 at 3:42 PM, England, Michael < > > mengl...@homeadvisor.com> > > wrote: > > > > > Neha, > > > > > > I don’t see that error message in the logs. The error that I included > in > > > my original email is the only error that I see from Kafka. > > > > > > Do I need to change log levels get the info that you need? > > > > > > Mike > > > > > > -----Original Message----- > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > > > Sent: Wednesday, June 25, 2014 4:31 PM > > > To: users@kafka.apache.org > > > Subject: Re: Failed to send messages after 3 tries > > > > > > Could you provide information on why each retry failed. Look for an > error > > > message that says "Failed to send producer request". > > > > > > > > > On Wed, Jun 25, 2014 at 2:18 PM, England, Michael < > > > mengl...@homeadvisor.com> > > > wrote: > > > > > > > Hi, > > > > > > > > I get the following error from my producer when sending a message: > > > > Caused by: kafka.common.FailedToSendMessageException: Failed to send > > > > messages after 3 tries. > > > > at > > > > > > > > > > kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90) > > > > at kafka.producer.Producer.send(Producer.scala:76) > > > > at > > > kafka.javaapi.producer.Producer.send(Producer.scala:42) > > > > at > > > > > > > > > > com.servicemagic.kafka.producer.KafkaProducerTemplate.send(KafkaProducerTemplate.java:37) > > > > ... 31 more > > > > > > > > The producer is running locally, the broker is on a different > machine. > > I > > > > can telnet to the broker, so it isn't a network issue. Also, I have > > other > > > > producers that work fine using the same broker (but a different > topic). > > > > > > > > I've checked the various logs on the broker, but I don't see anything > > > > obvious in them. I'm not sure how to turn up the logging level, > though, > > > so > > > > perhaps there would be useful info if I could do that. > > > > > > > > Can you give me some suggestions on how to troubleshoot this issue? > > > > > > > > Thanks, > > > > > > > > Mike > > > > > > > > > >