RE: Failed to send messages after 3 tries

England, Michael Tue, 01 Jul 2014 14:57:12 -0700

We're using kafka 0.8.1.1

-----Original Message-----
From: Jun Rao [mailto:jun...@gmail.com] 
Sent: Monday, June 30, 2014 10:23 AM
To: users@kafka.apache.org
Subject: Re: Failed to send messages after 3 tries


Which version of Kafka are you using?

Thanks,

Jun


On Fri, Jun 27, 2014 at 11:57 AM, England, Michael <mengl...@homeadvisor.com
> wrote:

> Neha,
>
> In state-change.log I see lots of logging from when I last started up
> kafka, and nothing after that. I do see a bunch of errors of the form:
> [2014-06-25 13:21:37,124] ERROR Controller 1 epoch 11 initiated state
> change for partition [lead.indexer,37] from OfflinePartition to
> OnlinePartition failed (state.change.logger)
> kafka.common.NoReplicaOnlineException: No replica for partition
> [lead.indexer,37] is alive. Live brokers are: [Set()], Assigned replicas
> are: [List(1)]
>         at
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
>         at
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336)
>         at
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185)
>         at
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
>         at
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
>         at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>         at
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
>         at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
>         at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
>         at
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
>         at
> kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68)
>         at
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312)
>         at
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162)
>         at
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63)
>         at
> kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:49)
>         at
> kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47)
>         at
> kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47)
>         at kafka.utils.Utils$.inLock(Utils.scala:538)
>         at
> kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:47)
>         at
> kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:637)
>         at
> kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633)
>         at
> kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633)
>         at kafka.utils.Utils$.inLock(Utils.scala:538)
>         at
> kafka.controller.KafkaController.startup(KafkaController.scala:633)
>         at kafka.server.KafkaServer.startup(KafkaServer.scala:96)
>         at
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
>         at kafka.Kafka$.main(Kafka.scala:46)
>         at kafka.Kafka.main(Kafka.scala)
>
> And also errors of the form:
> [2014-06-25 13:21:42,502] ERROR Broker 1 aborted the become-follower state
> change with correlation id 4 from controller 1 epoch 10 for partition
> [lead.indexer,11] new leader -1 (state.change.logger)
>
> Are either of these of concern?
>
> In controller.log there I also see logging from start-up, and then
> nothing. There are no errors, but I do see some warnings. They seem rather
> benign. Here's a sample:
> [2014-06-25 13:21:47,678] WARN [OfflinePartitionLeaderSelector]: No broker
> in ISR is alive for [lead.indexer,45]. Elect leader 1 from live brokers 1.
> There's potential data loss.
> (kafka.controller.OfflinePartitionLeaderSelector)
> [2014-06-25 13:21:47,678] INFO [OfflinePartitionLeaderSelector]: Selected
> new leader and ISR {"leader":1,"leader_epoch":3,"isr":[1]} for offline
> partition [lead.indexer,45]
> (kafka.controller.OfflinePartitionLeaderSelector)
>
> In kafka.out I see this error message:
> [2014-06-27 11:50:01,366] ERROR Closing socket for /10.1.162.67 because
> of error (kafka.network.Processor)
> java.io.IOException: Connection reset by peer
>       at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>       at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>       at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
>       at sun.nio.ch.IOUtil.write(IOUtil.java:60)
>       at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
>       at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:217)
>       at kafka.network.Processor.write(SocketServer.scala:375)
>       at kafka.network.Processor.run(SocketServer.scala:247)
>       at java.lang.Thread.run(Thread.java:722)
>
> My understanding is that this is fine, and simply correlates with me
> shutting down a producer or consumer.
>
> Kafka-request.log is empty.
>
> In server.log there are just a few lines that look like this:
> [2014-06-27 12:04:10,620] INFO Closing socket connection to /10.1.162.67.
> (kafka.network.Processor)
> [2014-06-27 12:04:11,681] INFO Closing socket connection to /10.1.162.67.
> (kafka.network.Processor)
> [2014-06-27 12:12:40,561] INFO Closing socket connection to /10.3.230.131.
> (kafka.network.Processor)
> [2014-06-27 12:12:40,776] INFO Closing socket connection to /10.3.230.126.
> (kafka.network.Processor)
> [2014-06-27 12:12:40,776] INFO Closing socket connection to /10.3.230.126.
> (kafka.network.Processor)
> [2014-06-27 12:12:40,803] INFO Closing socket connection to /10.3.230.126.
> (kafka.network.Processor)
> [2014-06-27 12:12:40,804] INFO Closing socket connection to /10.3.230.126.
> (kafka.network.Processor)
> [2014-06-27 12:12:44,900] INFO Closing socket connection to /10.3.230.131.
> (kafka.network.Processor)
> [2014-06-27 12:17:44,242] INFO Closing socket connection to /10.1.162.114.
> (kafka.network.Processor)
>
> If you'd like to see more log output, please let me know the best way to
> send you the complete files. Some of the logs are large, and I'm reluctant
> to send them to the mailing list as attachments.
>
> Thanks,
>
> Mike
>
> -----Original Message-----
> From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> Sent: Friday, June 27, 2014 11:30 AM
> To: users@kafka.apache.org
> Subject: Re: Failed to send messages after 3 tries
>
> I'm not so sure what is causing those exceptions. When you send data, do
> you see any errors in the server logs? Could you send it around?
>
>
> On Fri, Jun 27, 2014 at 10:00 AM, England, Michael <
> mengl...@homeadvisor.com
> > wrote:
>
> > Neha,
> >
> > Apologies for the slow response. I was out yesterday.
> >
> > To answer your questions....
> > -- Is the LeaderNotAvailableException repeatable? Yes. I happens whenever
> > I send a message to that topic.
> > -- Are you running Kafka in the cloud? No.
> >
> > Does this problem indicate that the topic is corrupt? If so, what would I
> > need to do to clean it up?
> >
> > Thanks,
> >
> > Mike
> >
> >
> >
> > -----Original Message-----
> > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > Sent: Wednesday, June 25, 2014 11:24 PM
> > To: users@kafka.apache.org
> > Subject: Re: Failed to send messages after 3 tries
> >
> > The output from the list topic tool suggests that a leader is available
> for
> > all partitions. Is the LeaderNotAvailableException repeatable? Are you
> > running Kafka in the cloud?
> >
> >
> > On Wed, Jun 25, 2014 at 4:03 PM, England, Michael <
> > mengl...@homeadvisor.com>
> > wrote:
> >
> > > By the way, this is what I get when I describe the topic:
> > >
> > >       Topic:lead.indexer PartitionCount:53    ReplicationFactor:1
> >  Configs:
> > >       Topic: lead.indexer       Partition: 0    Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 1    Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 2    Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 3    Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 4    Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 5    Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 6    Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 7    Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 8    Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 9    Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 10   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 11   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 12   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 13   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 14   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 15   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 16   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 17   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 18   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 19   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 20   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 21   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 22   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 23   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 24   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 25   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 26   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 27   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 28   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 29   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 30   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 31   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 32   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 33   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 34   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 35   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 36   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 37   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 38   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 39   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 40   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 41   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 42   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 43   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 44   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 45   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 46   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 47   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 48   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 49   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 50   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >       Topic: lead.indexer       Partition: 51   Leader: 1  Replicas: 1
> > >  Isr: 1
> > >       Topic: lead.indexer       Partition: 52   Leader: 2  Replicas: 2
> > >  Isr: 2
> > >
> > > -----Original Message-----
> > > From: England, Michael
> > > Sent: Wednesday, June 25, 2014 4:58 PM
> > > To: users@kafka.apache.org
> > > Subject: RE: Failed to send messages after 3 tries
> > >
> > > Ok, at WARN level I see the following:
> > >
> > > 2014-06-25 16:46:16 WARN kafka-consumer-sp_lead.index.processor1
> > > kafka.producer.BrokerPartitionInfo -  Error while fetching metadata
> > > [{TopicMetadata for topic lead.indexer ->
> > > No partition metadata for topic lead.indexer due to
> > > kafka.common.LeaderNotAvailableException}] for topic [lead.indexer]:
> > class
> > > kafka.common.LeaderNotAvailableException
> > >
> > > Any suggestions about how to address this? I see that there are some
> > > threads about this in the mailing list archive. I'll start to look
> > through
> > > them.
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> > > -----Original Message-----
> > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > > Sent: Wednesday, June 25, 2014 4:47 PM
> > > To: users@kafka.apache.org
> > > Subject: Re: Failed to send messages after 3 tries
> > >
> > > It should be at WARN.
> > >
> > >
> > > On Wed, Jun 25, 2014 at 3:42 PM, England, Michael <
> > > mengl...@homeadvisor.com>
> > > wrote:
> > >
> > > > Neha,
> > > >
> > > > I don’t see that error message in the logs. The error that I included
> > in
> > > > my original email is the only error that I see from Kafka.
> > > >
> > > > Do I need to change log levels get the info that you need?
> > > >
> > > > Mike
> > > >
> > > > -----Original Message-----
> > > > From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
> > > > Sent: Wednesday, June 25, 2014 4:31 PM
> > > > To: users@kafka.apache.org
> > > > Subject: Re: Failed to send messages after 3 tries
> > > >
> > > > Could you provide information on why each retry failed. Look for an
> > error
> > > > message that says "Failed to send producer request".
> > > >
> > > >
> > > > On Wed, Jun 25, 2014 at 2:18 PM, England, Michael <
> > > > mengl...@homeadvisor.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I get the following error from my producer when sending a message:
> > > > > Caused by: kafka.common.FailedToSendMessageException: Failed to
> send
> > > > > messages after 3 tries.
> > > > >                 at
> > > > >
> > > >
> > >
> >
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
> > > > >                 at kafka.producer.Producer.send(Producer.scala:76)
> > > > >                 at
> > > > kafka.javaapi.producer.Producer.send(Producer.scala:42)
> > > > >                 at
> > > > >
> > > >
> > >
> >
> com.servicemagic.kafka.producer.KafkaProducerTemplate.send(KafkaProducerTemplate.java:37)
> > > > >                 ... 31 more
> > > > >
> > > > > The producer is running locally, the broker is on a different
> > machine.
> > > I
> > > > > can telnet to the broker, so it isn't a network issue. Also, I have
> > > other
> > > > > producers that work fine using the same broker (but a different
> > topic).
> > > > >
> > > > > I've checked the various logs on the broker, but I don't see
> anything
> > > > > obvious in them. I'm not sure how to turn up the logging level,
> > though,
> > > > so
> > > > > perhaps there would be useful info if I could do that.
> > > > >
> > > > > Can you give me some suggestions on how to troubleshoot this issue?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mike
> > > > >
> > > >
> > >
> >
>

RE: Failed to send messages after 3 tries

Reply via email to