This means that the broker 0 thought broker 28 was leader for that partition 
but broker 28 has actually already received StopReplicaRequest from controller 
and stopped serving as a replica for that partition.
This might happen transiently but broker 0 will be able to find the new leader 
for the partition once it receive LeaderAndIsrRequest from controller to update 
the new leader information. If these messages keep got logged for long time 
then there might be an issue.
Can you maybe check the timestamp around [2015-04-21 12:15:36,585] on broker 28 
to see if there is some error log. The error log might not have partition info 
included.

From: Wes Chow <w...@chartbeat.com<mailto:w...@chartbeat.com>>
Reply-To: "users@kafka.apache.org<mailto:users@kafka.apache.org>" 
<users@kafka.apache.org<mailto:users@kafka.apache.org>>
Date: Tuesday, April 21, 2015 at 10:50 AM
To: "users@kafka.apache.org<mailto:users@kafka.apache.org>" 
<users@kafka.apache.org<mailto:users@kafka.apache.org>>
Subject: Re: partition reassignment stuck


Not for that particular partition, but I am seeing these errors on 28:

kafka.common.NotAssignedReplicaException: Leader 28 failed to record follower 
25's position 0 for partition [click_engage,116] since the replica 25 is not 
recognized to be one of the assigned r
eplicas  for partition [click_engage,116]
        at 
kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:231)
        at 
kafka.server.ReplicaManager.recordFollowerPosition(ReplicaManager.scala:432)
        at 
kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis.scala:460)
        at 
kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis.scala:458)
        at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:176)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:345)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:345)
        at kafka.server.KafkaApis.maybeUpdatePartitionHw(KafkaApis.scala:458)
        at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:424)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:186)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
        at java.lang.Thread.run(Thread.java:745)

What does this mean?

Thanks!
Wes


[cid:part1.08040305.06010608@chartbeat.com]
Jiangjie Qin<mailto:j...@linkedin.com.INVALID>
April 21, 2015 at 1:19 PM
Those 00000000000000000000.index files are for different partitions and
they should be generated if new replicas is assigned to the broker.
We might want to know what caused the UnknownException. Did you see any
error log on broker 28?

Jiangjie (Becket) Qin


[cid:part2.02070705.06050804@chartbeat.com]
Wes Chow<mailto:w...@chartbeat.com>
April 21, 2015 at 12:16 PM
I started a partition reassignment (this is a 8.1.1 cluster) some time ago and 
it seems to be stuck. Partitions are no longer getting moved around, but it 
seems like the cluster is operational otherwise. The stuck nodes have a lot of 
00000000000000000000.index files, and their logs show errors like:

[2015-04-21 12:15:36,585] 3237789 [ReplicaFetcherThread-0-28] ERROR 
kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-0-28], Error for 
partition [pings,227] to broker 28:class kafka.common.UnknownException

I'm at a loss as to what I should be looking at. Any ideas?

Thanks,
Wes

Reply via email to