[ https://issues.apache.org/jira/browse/KAFKA-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793461#comment-16793461 ]
Narayan Periwal commented on KAFKA-6178: ---------------------------------------- We are also seeing the same issue in our kafka cluster. We are using the version 0.10.2.1 > Broker is listed as only ISR for all partitions it is leader of > --------------------------------------------------------------- > > Key: KAFKA-6178 > URL: https://issues.apache.org/jira/browse/KAFKA-6178 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.1.0 > Environment: Windows > Reporter: AS > Priority: Major > Labels: windows > Attachments: KafkaServiceOutput.txt, log-cleaner.log, server.log > > > We're running a 15 broker cluster on windows machines, and one of the > brokers, 10, is the only ISR on all partitions that it is the leader of. On > partitions where it isn't the leader, it seems to follow the leadeer fine. > This is an excerpt from 'describe': > Topic: ClientQosCombined Partition: 458 Leader: 10 Replicas: > 10,6,7,8,9,0,1 Isr: 10 > Topic: ClientQosCombined Partition: 459 Leader: 11 Replicas: > 11,7,8,9,0,1,10 Isr: 0,10,1,9,7,11,8 > The server.log files all seem to be pretty standard, and the only indication > of this issue is the following pattern that often repeats: > 2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition > [kafka-request-handler-8:] - Partition [ClientQosCombined,398] on broker 10: > Expanding ISR for partition [ClientQosCombined,398] from 10 to 5,10 > 2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - > Partition [ClientQosCombined,398] on broker 10: Shrinking ISR for partition > [ClientQosCombined,398] from 5,10 to 10 > For each of the partitions that 10 leads. This is the only topic that we > currently have in our cluster. The __consumer_offsets topic seems completely > normal in terms of isr counts. The controller is broker 5, which is cycling > through attempting and failing to trigger leader elections on broker 10 led > partitions. From the controller log in broker 5: > 2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController > [kafka-scheduler-0:] - [Controller 5]: Starting preferred replica leader > election for partitions [ClientQosCombined,375] > 2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine > [kafka-scheduler-0:] - [Partition state machine on Controller 5]: Invoking > state change to OnlinePartition for partitions [ClientQosCombined,375] > 2017-11-06 20:45:04,857 [INFO] > kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:] > - [PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition > [ClientQosCombined,375] is not the preferred replica. Trigerring preferred > replica leader election > 2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController > [kafka-scheduler-0:] - [Controller 5]: Partition [ClientQosCombined,375] > failed to complete preferred replica leader election. Leader is 10 > I've also attached the logs and output from broker 10. Any idea what's wrong > here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)