[jira] [Commented] (KAFKA-3126) Weird behavior in kafkaController on Controlled shutdowns. The leaderAndIsr in zookeeper is not updated during controlled shutdown.

Jiangjie Qin (JIRA) Thu, 21 Jan 2016 11:01:07 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111099#comment-15111099
 ]


Jiangjie Qin commented on KAFKA-3126:
-------------------------------------

Currently the controlled shutdown and the state update are actually happening 
parallel. 

1. We update the partitions state one by one, releasing and re-grab the lock 
between each partition state change.
2. LeaderAndIsrRequest sent during controlled shutdown are sent asynchronously.
3. It is possible that the leader updates ISR after the controller update ISR 
during controlled shutdown.

Can we exclude the following sequence?
0. Partition p, leader broker C, ISR [A, C]
1. Broker A send controlled shutdown request.
2. Controller B update ISR of partition p from [A, C] to [C]
3. Before the LeaderAndIsrRequest reflecting the change in (2) reaches broker 
C, broker C expands leader and ISR from [A] to [A, C].
4. The ISR change in 3 was propagated to controller B.
5. When Broker A actually shuts down, Controller B will see A in the ISR.


> Weird behavior in kafkaController on Controlled shutdowns. The leaderAndIsr 
> in zookeeper is not updated during controlled shutdown.
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-3126
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3126
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Mayuresh Gharat
>            Assignee: Mayuresh Gharat
>
> Consider Broker B is controller, broker A is undergoing shutdown. 
> 2016/01/14 19:49:22.884 [KafkaController] [Controller B]: Shutting down 
> broker A
> 2016/01/14 19:49:22.918 [ReplicaStateMachine] [Replica state machine on 
> controller B]: Invoking state change to OfflineReplica for replicas 
> [Topic=testTopic1,Partition=1,Replica=A] -------> (1)
> 2016/01/14 19:49:22.930 [KafkaController] [Controller B]: New leader and ISR 
> for partition [testTopic1,1] is {"leader":D,"leader_epoch":1,"isr":[D]} 
> ------> (2)
> 2016/01/14 19:49:23.028 [ReplicaStateMachine] [Replica state machine on 
> controller B]: Invoking state change to OfflineReplica for replicas 
> [Topic=testTopic2,Partition=1,Replica=A] -------> (3)
> 2016/01/14 19:49:23.032 [KafkaController] [Controller B]: New leader and ISR 
> for partition [testTopic2,1] is {"leader":C,"leader_epoch":10,"isr":[C]} 
> -----> (4)
> 2016/01/14 19:49:23.996 [KafkaController] [Controller B]: Broker failure 
> callback for A
> 2016/01/14 19:49:23.997 [PartitionStateMachine] [Partition state machine on 
> Controller B]: Invoking state change to OfflinePartition for partitions 
> 2016/01/14 19:49:23.998 [ReplicaStateMachine] [Replica state machine on 
> controller B]: Invoking state change to OfflineReplica for replicas 
> [Topic=testTopic2,Partition=0,Replica=A],
> [Topic=__consumer_offsets,Partition=5,Replica=A],
> [Topic=testTopic1,Partition=2,Replica=A],
> [Topic=__consumer_offsets,Partition=96,Replica=A],
> [Topic=testTopic2,Partition=1,Replica=A],
> [Topic=__consumer_offsets,Partition=36,Replica=A],
> [Topic=testTopic1,Partition=4,Replica=A],
> [Topic=__consumer_offsets,Partition=85,Replica=A],
> [Topic=testTopic1,Partition=6,Replica=A],
> [Topic=testTopic1,Partition=1,Replica=A]
> 2016/01/14 19:49:24.029 [KafkaController] [Controller B]: New leader and ISR 
> for partition [testTopic2,1] is {"leader":C,"leader_epoch":11,"isr":[C]} 
> ------> (5)
> 2016/01/14 19:49:24.212 [KafkaController] [Controller B]: Cannot remove 
> replica A from ISR of partition [testTopic1,1] since it is not in the ISR. 
> Leader = D ; ISR = List(D) ----------> (6)
> If after (1) and (2) controller gets rid of the replica A from the ISR in 
> zookeeper for [testTopic1-1] as displayed in 6), why doesn't it do the  same 
> for [testTopic2-1] as per (5)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3126) Weird behavior in kafkaController on Controlled shutdowns. The leaderAndIsr in zookeeper is not updated during controlled shutdown.

Reply via email to