Hi,

We have 44 partitions in our cluster that are unavailable. kafka-topics.sh is 
reporting them with Leader: -1, and with no brokers in the ISR. Zookeeper says 
that broker 5 should be the partition leader for this topic partition. These 
are topics with replication-factor 1. Most of the topics have little to no data 
in them, so they are low-traffic topics. We currently cannot produce to them. 
And I can't find anything in the logs that seems to explain why the broker is 
not taking the partitions back online. Can anyone help?

Relevant log lines are attached below.

Questions:
* What does Leader: -1 mean?
* Why doesn't the broker take the partition back online?
* Is there more debugging/logging that I can turn on?
* unclean.leader.election.enable=false right now, although during a previous 
boot of the broker, we set it to true to get some partitions back online. These 
ones never came back online.

Thanks,
-James

in zookeeper
-------
$ get /brokers/topics/the.topic.name
{"version":1,"partitions":{"0":[5]}}

server.log
--------
[2016-03-01 06:29:13,869] WARN Found an corrupted index file, 
/TivoData/kafka/the.topic.name-0/00000000000000000000.index, deleting and 
rebuilding index... (kafka.log.Log)
[2016-03-01 06:29:13,870] INFO Recovering unflushed segment 0 in log 
the.topic.name-0. (kafka.log.Log)
[2016-03-01 06:29:13,870] INFO Completed load of log the.topic.name-0 with log 
end offset 0 (kafka.log.Log)


state-change.log
---------
state-change.log.2016-03-01-06:[2016-03-01 06:34:20,498] TRACE Broker 5 cached 
leader info 
(LeaderAndIsrInfo:(Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20),ReplicationFactor:1),AllReplicas:5)
 for partition [the.topic.name,0] in response to UpdateMetadata request sent by 
controller 2 epoch 20 with correlation id 0 (state.change.logger)
state-change.log.2016-03-01-06:[2016-03-01 06:34:20,695] TRACE Broker 5 
received LeaderAndIsr request 
(LeaderAndIsrInfo:(Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20),ReplicationFactor:1),AllReplicas:5)
 correlation id 1 from controller 2 epoch 20 for partition [the.topic.name,0] 
(state.change.logger)
state-change.log.2016-03-01-06:[2016-03-01 06:34:20,957] TRACE Broker 5 
handling LeaderAndIsr request correlationId 1 from controller 2 epoch 20 
starting the become-follower transition for partition [the.topic.name,0] 
(state.change.logger)
state-change.log.2016-03-01-06:[2016-03-01 06:34:23,531] ERROR Broker 5 
received LeaderAndIsrRequest with correlation id 1 from controller 2 epoch 20 
for partition [the.topic.name,0] but cannot become follower since the new 
leader -1 is unavailable. (state.change.logger)
state-change.log.2016-03-01-06:[2016-03-01 06:34:30,075] TRACE Broker 5 
completed LeaderAndIsr request correlationId 1 from controller 2 epoch 20 for 
the become-follower transition for partition [the.topic.name,0] 
(state.change.logger)
state-change.log.2016-03-01-06:[2016-03-01 06:34:30,458] TRACE Broker 5 cached 
leader info 
(LeaderAndIsrInfo:(Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20),ReplicationFactor:1),AllReplicas:5)
 for partition [the.topic.name,0] in response to UpdateMetadata request sent by 
controller 2 epoch 20 with correlation id 2 (state.change.logger)

state-change.log on the controller:
[2016-03-01 06:34:15,077] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 2 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,145] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 5 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,200] TRACE Broker 2 cached leader info 
(LeaderAndIsrInfo:(Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20),ReplicationFactor:1),AllReplicas:5)
 for partition [the.topic.name,0] in response to UpdateMetadata request sent by 
controller 2 epoch 20 with correlation id 9144 (state.change.logger)
[2016-03-01 06:34:15,276] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 4 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,418] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 1 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,484] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 3 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,585] TRACE Controller 2 epoch 20 changed state of replica 
5 for partition [the.topic.name,0] from OfflineReplica to OnlineReplica 
(state.change.logger)
[2016-03-01 06:34:15,606] TRACE Controller 2 epoch 20 sending become-follower 
LeaderAndIsr request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to 
broker 5 for partition [the.topic.name,0] (state.change.logger)
[2016-03-01 06:34:15,645] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 2 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,775] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 5 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,787] TRACE Broker 2 cached leader info 
(LeaderAndIsrInfo:(Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20),ReplicationFactor:1),AllReplicas:5)
 for partition [the.topic.name,0] in response to UpdateMetadata request sent by 
controller 2 epoch 20 with correlation id 9145 (state.change.logger)
[2016-03-01 06:34:15,838] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 4 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,879] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 1 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,909] TRACE Controller 2 epoch 20 sending UpdateMetadata 
request (Leader:-1,ISR:,LeaderEpoch:56,ControllerEpoch:20) to broker 3 for 
partition the.topic.name-0 (state.change.logger)
[2016-03-01 06:34:15,972] TRACE Controller 2 epoch 20 started leader election 
for partition [the.topic.name,0] (state.change.logger)
[2016-03-01 06:34:15,973] ERROR Controller 2 epoch 20 initiated state change 
for partition [the.topic.name,0] from OfflinePartition to OnlinePartition 
failed (state.change.logger)
kafka.common.NoReplicaOnlineException: No broker in ISR for partition 
[the.topic.name,0] is alive. Live brokers are: [Set(5, 1, 2, 3, 4)], ISR 
brokers are: []
[2016-03-01 06:34:30,427] TRACE Controller 2 epoch 20 received response 
{error_code=0,partitions=[{topic=the.topic.name=0,error_code=0}]} for a request 
sent to broker Node(5, core05.tec1.tivo.com, 9092) (state.change.logger)
[2016-03-01 06:38:41,552] TRACE Controller 2 epoch 20 started leader election 
for partition [the.topic.name,0] (state.change.logger)
[2016-03-01 06:38:41,554] ERROR Controller 2 epoch 20 encountered error while 
electing leader for partition [the.topic.name,0] due to: Preferred replica 5 
for partition [the.topic.name,0] is either not alive or not in the isr. Current 
leader and ISR: [{"leader":-1,"leader_epoch":56,"isr":[]}]. 
(state.change.logger)
[2016-03-01 06:38:41,554] ERROR Controller 2 epoch 20 initiated state change 
for partition [the.topic.name,0] from OfflinePartition to OnlinePartition 
failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing 
leader for partition [the.topic.name,0] due to: Preferred replica 5 for 
partition [the.topic.name,0] is either not alive or not in the isr. Current 
leader and ISR: [{"leader":-1,"leader_epoch":56,"isr":[]}].
Caused by: kafka.common.StateChangeFailedException: Preferred replica 5 for 
partition [the.topic.name,0] is either not alive or not in the isr. Current 
leader and ISR: [{"leader":-1,"leader_epoch":56,"isr":[]}]


________________________________

This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Reply via email to