[ 
https://issues.apache.org/jira/browse/KAFKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205531#comment-14205531
 ] 

Jon Riegel commented on KAFKA-1460:
-----------------------------------

Separating kafka and zookeeper onto different instances is definitely something 
we plan to do near term.  However, their ZK and kafka data directories were 
configured to be on different EBS volumes at the time of this incident, and it 
did not occur at a peak usage time, so I'm disinclined to chalk the outage up 
to I/O load.  In addition, each of our topics has replication factor 3.

> NoReplicaOnlineException: No replica for partition
> --------------------------------------------------
>
>                 Key: KAFKA-1460
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1460
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Artur Denysenko
>            Priority: Critical
>         Attachments: state-change.log
>
>
> We have a standalone kafka server.
> After several days of running we get:
> {noformat}
> kafka.common.NoReplicaOnlineException: No replica for partition 
> [gk.q.module,1] is alive. Live brokers are: [Set()], Assigned replicas are: 
> [List(0)]
>       at 
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
>       at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336)
>       at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185)
>       at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
>       at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
>       at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
>       at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>       at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>       at 
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
>       at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
>       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
>       at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
>       at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
>       at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96)
>       at 
> kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68)
>       at 
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312)
>       at 
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162)
>       at 
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1068)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1066)
>       at kafka.utils.Utils$.inLock(Utils.scala:538)
>       at 
> kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1066)
>       at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
>       at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}
> Please see attached [state-change.log]
> You can find all server logs (450mb) here: 
> http://46.4.114.35:9999/deploy/kafka-logs.2014-05-14-16.tgz
> On client we get:
> {noformat}
> 16:28:36,843 [ool-12-thread-2] WARN  ZookeeperConsumerConnector - 
> [dev_dev-1400257716132-e7b8240c], no brokers found when trying to rebalance.
> {noformat}
> If we try to send message using 'kafka-console-producer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-producer.sh --broker-list 
> localhost:9092 --topic test
> message
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> [2014-05-16 19:45:30,950] WARN Fetching topic metadata with correlation id 0 
> for topics [Set(test)] from broker [id:0,host:localhost,port:9092] failed 
> (kafka.client.ClientUtils$)
> java.net.SocketTimeoutException
>         at 
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         at kafka.utils.Utils$.read(Utils.scala:375)
>         at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
>         at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:74)
>         at 
> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
>         at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
>         at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
>         at 
> kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
>         at 
> kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67)
>         at kafka.utils.Utils$.swallow(Utils.scala:167)
>         at kafka.utils.Logging$class.swallowError(Logging.scala:106)
>         at kafka.utils.Utils$.swallowError(Utils.scala:46)
>         at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67)
>         at 
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
>         at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
>         at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
>         at scala.collection.immutable.Stream.foreach(Stream.scala:526)
>         at 
> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
>         at 
> kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)
> {noformat}
> If we try to receive message using 'kafka-console-consumer.sh':
> {noformat}
> [root@dev kafka]# /srv/kafka/bin/kafka-console-consumer.sh --zookeeper 
> localhost:2181 --topic test
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> [2014-05-16 19:46:23,029] WARN 
> [console-consumer-69449_dev-1400262382648-1c9bfcd3], no brokers found when 
> trying to rebalance. (kafka.consumer.ZookeeperConsumerConnector)
> {noformat}
> Port 9092 is open:
> {noformat}
> [root@dev kafka]# telnet localhost 9092
> Trying 127.0.0.1...
> Connected to localhost.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to