[ 
https://issues.apache.org/jira/browse/KAFKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093200#comment-14093200
 ] 

Ryan Williams commented on KAFKA-1460:
--------------------------------------

I'm looking into this as well. Kafka was running fine on Friday, now I come 
back on Monday and producers are unable to send messages. I posted to users 
list as well, will see if that surfaces anything to look at.

===============
Producer error
===================
[2014-08-11 19:32:49,781] WARN Error while fetching metadata [{TopicMetadata 
for topic mytopic -> 
No partition metadata for topic mytopic due to 
kafka.common.LeaderNotAvailableException}] for topic [mytopic]: class 
kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo)
[2014-08-11 19:32:49,782] ERROR Failed to collate messages by topic, partition 
due to: Failed to fetch topic metadata for topic: mytopic 
(kafka.producer.async.DefaultEventHandler)

===============
state-change.log
===============
[2014-08-11 19:12:45,312] TRACE Controller 0 epoch 3 started leader election 
for partition [mytopic,0] (state.change.logger)
[2014-08-11 19:12:45,321] ERROR Controller 0 epoch 3 initiated state change for 
partition [mytopic,0] from OfflinePartition to OnlinePartition failed 
(state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition [mytopic,0] is 
alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
        at 
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
[2014-08-11 19:12:45,312] TRACE Controller 0 epoch 3 started leader election 
for partition [mytopic,1] (state.change.logger)
[2014-08-11 19:12:45,321] ERROR Controller 0 epoch 3 initiated state change for 
partition [mytopic,1] from OfflinePartition to OnlinePartition failed 
(state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition [mytopic,1] is 
alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
        at 
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)

===============
controller.log
===============
[2014-08-11 19:12:45,308] DEBUG [OfflinePartitionLeaderSelector]: No broker in 
ISR is alive for [mytopic,1]. Pick the leader from the alive assigned replicas: 
 (kafka.controller.OfflinePartitionLeaderSelector)
[2014-08-11 19:12:45,321] DEBUG [OfflinePartitionLeaderSelector]: No broker in 
ISR is alive for [mytopic,0]. Pick the leader from the alive assigned replicas: 
 (kafka.controller.OfflinePartitionLeaderSelector)

> NoReplicaOnlineException: No replica for partition
> --------------------------------------------------
>
>                 Key: KAFKA-1460
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1460
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Artur Denysenko
>            Priority: Critical
>         Attachments: state-change.log
>
>
> We have a standalone kafka server.
> After several days of running we get:
> {noformat}
> kafka.common.NoReplicaOnlineException: No replica for partition 
> [gk.q.module,1] is alive. Live brokers are: [Set()], Assigned replicas are: 
> [List(0)]
>       at 
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
>       at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336)
>       at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185)
>       at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
>       at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
>       at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
>       at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>       at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>       at 
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
>       at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
>       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
>       at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
>       at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
>       at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
>       at 
> kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68)
>       at 
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312)
>       at 
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162)
>       at 
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1068)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
>       at kafka.utils.Utils$.inLock(Utils.scala:538)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1066)
>       at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
>       at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}
> Please see attached [state-change.log]
> You can find all server logs (450mb) here: 
> http://46.4.114.35:9999/deploy/kafka-logs.2014-05-14-16.tgz
> On client we get:
> {noformat}
> 16:28:36,843 [ool-12-thread-2] WARN  ZookeeperConsumerConnector - 
> [dev_dev-1400257716132-e7b8240c], no brokers found when trying to rebalance.
> {noformat}
> If we try to send message using 'kafka-console-producer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-producer.sh --broker-list 
> localhost:9092 --topic test
> message
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> [2014-05-16 19:45:30,950] WARN Fetching topic metadata with correlation id 0 
> for topics [Set(test)] from broker [id:0,host:localhost,port:9092] failed 
> (kafka.client.ClientUtils$)
> java.net.SocketTimeoutException
>         at 
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         at kafka.utils.Utils$.read(Utils.scala:375)
>         at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
>         at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:74)
>         at 
> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
>         at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
>         at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
>         at 
> kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
>         at 
> kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67)
>         at kafka.utils.Utils$.swallow(Utils.scala:167)
>         at kafka.utils.Logging$class.swallowError(Logging.scala:106)
>         at kafka.utils.Utils$.swallowError(Utils.scala:46)
>         at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67)
>         at 
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
>         at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
>         at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
>         at scala.collection.immutable.Stream.foreach(Stream.scala:526)
>         at 
> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
>         at 
> kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)
> {noformat}
> If we try to receive message using 'kafka-console-consumer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-consumer.sh --zookeeper 
> localhost:2181 --topic test
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> [2014-05-16 19:46:23,029] WARN 
> [console-consumer-69449_dev-1400262382648-1c9bfcd3], no brokers found when 
> trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
> {noformat}
> Port 9092 is open:
> {noformat}
> [root@dev kafka]# telnet localhost 9092
> Trying 127.0.0.1...
> Connected to localhost.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to