Re: Failed to send messages after 3 tries

Jun Rao Wed, 02 Jul 2014 08:36:42 -0700

There are a couple of things you can try.

1. See if broker 1 and 2 are indeed registered in ZK (see Broker
registration info in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper
).


2. Does restarting broker 1 and 2 solve the issue?

Thanks,

Jun



On Tue, Jul 1, 2014 at 2:56 PM, England, Michael <mengl...@homeadvisor.com>
wrote:

> We're using kafka 0.8.1.1
>
> -----Original Message-----
> From: Jun Rao [mailto:jun...@gmail.com]
> Sent: Monday, June 30, 2014 10:23 AM
> To: users@kafka.apache.org
> Subject: Re: Failed to send messages after 3 tries
>
> Which version of Kafka are you using?
>
> Thanks,
>
> Jun
>
>
> On Fri, Jun 27, 2014 at 11:57 AM, England, Michael <
> mengl...@homeadvisor.com
> > wrote:
>
> > Neha,
> >
> > In state-change.log I see lots of logging from when I last started up
> > kafka, and nothing after that. I do see a bunch of errors of the form:
> > [2014-06-25 13:21:37,124] ERROR Controller 1 epoch 11 initiated state
> > change for partition [lead.indexer,37] from OfflinePartition to
> > OnlinePartition failed (state.change.logger)
> > kafka.common.NoReplicaOnlineException: No replica for partition
> > [lead.indexer,37] is alive. Live brokers are: [Set()], Assigned replicas
> > are: [List(1)]
> >         at
> >
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
> >         at
> >
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336)
> >         at
> >
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185)
> >         at
> >
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
> >         at
> >
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
> >         at
> >
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
> >         at
> >
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
> >         at
> >
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
> >         at scala.collection.Iterator$class.foreach(Iterator.scala:772)
> >         at
> > scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
> >         at
> >
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
> >         at
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
> >         at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
> >         at
> >
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
> >         at
> >
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
> >         at
> >
> kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68)
> >         at
> >
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312)
> >         at
> >
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162)
> >         at
> >
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63)
> >         at
> >
> kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:49)
> >         at
> >
> kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47)
> >         at
> >
> kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47)
> >         at kafka.utils.Utils$.inLock(Utils.scala:538)
> >         at
> >
> kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:47)
> >         at
> >
> kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:637)
> >         at
> >
> kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633)
> >         at
> >
> kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633)
> >         at kafka.utils.Utils$.inLock(Utils.scala:538)
> >         at
> > kafka.controller.KafkaController.startup(KafkaController.scala:633)
> >         at kafka.server.KafkaServer.startup(KafkaServer.scala:96)
> >         at
> > kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
> >         at kafka.Kafka$.main(Kafka.scala:46)
> >         at kafka.Kafka.main(Kafka.scala)
> >
> > And also errors of the form:
> > [2014-06-25 13:21:42,502] ERROR Broker 1 aborted the become-follower
> state
> > change with correlation id 4 from controller 1 epoch 10 for partition
> > [lead.indexer,11] new leader -1 (state.change.logger)
> >
> > Are either of these of concern?
> >
> > In controller.log there I also see logging from start-up, and then
> > nothing. There are no errors, but I do see some warnings. They seem
> rather
> > benign. Here's a sample:
> > [2014-06-25 13:21:47,678] WARN [OfflinePartitionLeaderSelector]: No
> broker
> > in ISR is alive for [lead.indexer,45]. Elect leader 1 from live brokers
> 1.
> > There's potential data loss.
> > (kafka.controller.OfflinePartitionLeaderSelector)
> > [2014-06-25 13:21:47,678] INFO [OfflinePartitionLeaderSelector]: Selected
> > new leader and ISR {"leader":1,"leader_epoch":3,"isr":[1]} for offline
> > partition [lead.indexer,45]
> > (kafka.controller.OfflinePartitionLeaderSelector)
> >
> > In kafka.out I see this error message:
> > [2014-06-27 11:50:01,366] ERROR Closing socket for /10.1.162.67 because
> > of error (kafka.network.Processor)
> > java.io.IOException: Connection reset by peer
> >       at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> >       at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> >       at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
> >       at sun.nio.ch.IOUtil.write(IOUtil.java:60)
> >       at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
> >       at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:217)
> >       at kafka.network.Processor.write(SocketServer.scala:375)
> >       at kafka.network.Processor.run(SocketServer.scala:247)
> >       at java.lang.Thread.run(Thread.java:722)
> >
> > My understanding is that this is fine, and simply correlates with me
> > shutting down a producer or consumer.
> >
> > Kafka-request.log is empty.
> >
> > In server.log there are just a few lines that look like this:
> > [2014-06-27 12:04:10,620] INFO Closing socket connection to /10.1.162.67
> .
> > (kafka.network.Processor)
> > [2014-06-27 12:04:11,681] INFO Closing socket connection to /10.1.162.67
> .
> > (kafka.network.Processor)
> > [2014-06-27 12:12:40,561] INFO Closing socket connection to /
> 10.3.230.131.
> > (kafka.network.Processor)
> > [2014-06-27 12:12:40,776] INFO Closing socket connection to /
> 10.3.230.126.
> > (kafka.network.Processor)
> > [2014-06-27 12:12:40,776] INFO Closing socket connection to /
> 10.3.230.126.
> > (kafka.network.Processor)
> > [2014-06-27 12:12:40,803] INFO Closing socket connection to /
> 10.3.230.126.
> > (kafka.network.Processor)
> > [2014-06-27 12:12:40,804] INFO Closing socket connection to /
> 10.3.230.126.
> > (kafka.network.Processor)
> > [2014-06-27 12:12:44,900] INFO Closing socket connection to /
> 10.3.230.131.
> > (kafka.network.Processor)
> > [2014-06-27 12:17:44,242] INFO Closing socket connection to /
> 10.1.162.114.
> > (kafka.network.Processor)
> >
> > If you'd like to see more log output, please let me know the best way to
> > send you the complete files. Some of the logs are large, and I'm
> reluctant
> > to send them to the mailing list as attachments.
> >
> > Thanks,
> >
> > Mike
> >
> > -----Original Message-----
> > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > Sent: Friday, June 27, 2014 11:30 AM
> > To: users@kafka.apache.org
> > Subject: Re: Failed to send messages after 3 tries
> >
> > I'm not so sure what is causing those exceptions. When you send data, do
> > you see any errors in the server logs? Could you send it around?
> >
> >
> > On Fri, Jun 27, 2014 at 10:00 AM, England, Michael <
> > mengl...@homeadvisor.com
> > > wrote:
> >
> > > Neha,
> > >
> > > Apologies for the slow response. I was out yesterday.
> > >
> > > To answer your questions....
> > > -- Is the LeaderNotAvailableException repeatable? Yes. I happens
> whenever
> > > I send a message to that topic.
> > > -- Are you running Kafka in the cloud? No.
> > >
> > > Does this problem indicate that the topic is corrupt? If so, what
> would I
> > > need to do to clean it up?
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > > Sent: Wednesday, June 25, 2014 11:24 PM
> > > To: users@kafka.apache.org
> > > Subject: Re: Failed to send messages after 3 tries
> > >
> > > The output from the list topic tool suggests that a leader is available
> > for
> > > all partitions. Is the LeaderNotAvailableException repeatable? Are you
> > > running Kafka in the cloud?
> > >
> > >
> > > On Wed, Jun 25, 2014 at 4:03 PM, England, Michael <
> > > mengl...@homeadvisor.com>
> > > wrote:
> > >
> > > > By the way, this is what I get when I describe the topic:
> > > >
> > > >       Topic:lead.indexer PartitionCount:53    ReplicationFactor:1
> > >  Configs:
> > > >       Topic: lead.indexer       Partition: 0    Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 1    Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 2    Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 3    Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 4    Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 5    Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 6    Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 7    Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 8    Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 9    Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 10   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 11   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 12   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 13   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 14   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 15   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 16   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 17   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 18   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 19   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 20   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 21   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 22   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 23   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 24   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 25   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 26   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 27   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 28   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 29   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 30   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 31   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 32   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 33   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 34   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 35   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 36   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 37   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 38   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 39   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 40   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 41   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 42   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 43   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 44   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 45   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 46   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 47   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 48   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 49   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 50   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >       Topic: lead.indexer       Partition: 51   Leader: 1  Replicas:
> 1
> > > >  Isr: 1
> > > >       Topic: lead.indexer       Partition: 52   Leader: 2  Replicas:
> 2
> > > >  Isr: 2
> > > >
> > > > -----Original Message-----
> > > > From: England, Michael
> > > > Sent: Wednesday, June 25, 2014 4:58 PM
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Failed to send messages after 3 tries
> > > >
> > > > Ok, at WARN level I see the following:
> > > >
> > > > 2014-06-25 16:46:16 WARN kafka-consumer-sp_lead.index.processor1
> > > > kafka.producer.BrokerPartitionInfo -  Error while fetching metadata
> > > > [{TopicMetadata for topic lead.indexer ->
> > > > No partition metadata for topic lead.indexer due to
> > > > kafka.common.LeaderNotAvailableException}] for topic [lead.indexer]:
> > > class
> > > > kafka.common.LeaderNotAvailableException
> > > >
> > > > Any suggestions about how to address this? I see that there are some
> > > > threads about this in the mailing list archive. I'll start to look
> > > through
> > > > them.
> > > >
> > > > Thanks,
> > > >
> > > > Mike
> > > >
> > > > -----Original Message-----
> > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > > > Sent: Wednesday, June 25, 2014 4:47 PM
> > > > To: users@kafka.apache.org
> > > > Subject: Re: Failed to send messages after 3 tries
> > > >
> > > > It should be at WARN.
> > > >
> > > >
> > > > On Wed, Jun 25, 2014 at 3:42 PM, England, Michael <
> > > > mengl...@homeadvisor.com>
> > > > wrote:
> > > >
> > > > > Neha,
> > > > >
> > > > > I don’t see that error message in the logs. The error that I
> included
> > > in
> > > > > my original email is the only error that I see from Kafka.
> > > > >
> > > > > Do I need to change log levels get the info that you need?
> > > > >
> > > > > Mike
> > > > >
> > > > > -----Original Message-----
> > > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > > > > Sent: Wednesday, June 25, 2014 4:31 PM
> > > > > To: users@kafka.apache.org
> > > > > Subject: Re: Failed to send messages after 3 tries
> > > > >
> > > > > Could you provide information on why each retry failed. Look for an
> > > error
> > > > > message that says "Failed to send producer request".
> > > > >
> > > > >
> > > > > On Wed, Jun 25, 2014 at 2:18 PM, England, Michael <
> > > > > mengl...@homeadvisor.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I get the following error from my producer when sending a
> message:
> > > > > > Caused by: kafka.common.FailedToSendMessageException: Failed to
> > send
> > > > > > messages after 3 tries.
> > > > > >                 at
> > > > > >
> > > > >
> > > >
> > >
> >
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
> > > > > >                 at
> kafka.producer.Producer.send(Producer.scala:76)
> > > > > >                 at
> > > > > kafka.javaapi.producer.Producer.send(Producer.scala:42)
> > > > > >                 at
> > > > > >
> > > > >
> > > >
> > >
> >
> com.servicemagic.kafka.producer.KafkaProducerTemplate.send(KafkaProducerTemplate.java:37)
> > > > > >                 ... 31 more
> > > > > >
> > > > > > The producer is running locally, the broker is on a different
> > > machine.
> > > > I
> > > > > > can telnet to the broker, so it isn't a network issue. Also, I
> have
> > > > other
> > > > > > producers that work fine using the same broker (but a different
> > > topic).
> > > > > >
> > > > > > I've checked the various logs on the broker, but I don't see
> > anything
> > > > > > obvious in them. I'm not sure how to turn up the logging level,
> > > though,
> > > > > so
> > > > > > perhaps there would be useful info if I could do that.
> > > > > >
> > > > > > Can you give me some suggestions on how to troubleshoot this
> issue?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mike
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Failed to send messages after 3 tries

Reply via email to