[ https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241383#comment-17241383 ]
zhangzhisheng edited comment on KAFKA-3042 at 12/3/20, 8:27 AM: ---------------------------------------------------------------- using kafka_2.12-2.4.1,zookeeper-3.5.7 3 ZKs 3 Broker cluster, topic replication factor is 2 linux (redhat) xfs kafka logs on single local disk error info {code:java} // server info kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:36,636] INFO [GroupCoordinator 2]: Assignment received from leader for group money-repayment-cmd-listener-1606227494097 for generation 126 (kafka.coordinator.group.GroupCoordinator) kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,220] INFO [Partition __consumer_offsets-13 broker=2] Shrinking ISR from 2,0,1 to 2.Leader: (highWatermark: 500993846, endOffset: 500993972). Out of sync replicas: (brokerId: 0, endOffset: 500993846) (brokerId: 1, endOffset: 500993967). (kafka.cluster.Partition) kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,223] INFO [Partition __consumer_offsets-13 broker=2] Cached zkVersion 131 not equalto that in zookeeper, skip updating ISR (kafka.cluster.Partition) kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,223] INFO [Partition __consumer_offsets-46 broker=2] Shrinking ISR from 2,0,1 to 2.Leader: (highWatermark: 281523643, endOffset: 281523684). Out of sync replicas: (brokerId: 0, endOffset: 281523643) (brokerId: 1, endOffset: 281523683). (kafka.cluster.Partition) kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,224] INFO [Partition __consumer_offsets-46 broker=2] Cached zkVersion 123 not equalto that in zookeeper, skip updating ISR (kafka.cluster.Partition) kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,224] INFO [Partition fcp-FFF-LOANFILE-201806271059-2 broker=2] Shrinking ISR from 2,0,1 to 2. Leader: (highWatermark: 9302797, endOffset: 9302806). Out of sync replicas: (brokerId: 0, endOffset: 9302797) (brokerId: 1, endOffset: 9302804). (kafka.cluster.Partition) kafka_2.12-2.4.1/logs/server.log.2020-11-28-01:[2020-11-28 01:51:37,227] INFO [Partition fcp-FFF-LOANFILE-201806271059-2 broker=2] Cached zkVersion 125 not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) {code} {code:java} // state info kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,073] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-22 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,073] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-FFF-account-201807131719-2 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-PCP-INSTRANSACTIONPOLICY-2018079116-0 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition LOAN_FAIL_MANAGE-202011231831270534-1 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-PFINMONEYMONITOR-LOANTXNSUB-201806271129-0 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-4 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_command_request-5 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-creditcore-loan-trans-201809112022-2 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-CREDITCORE-LOAN-TRANS-20180791126-0 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-7 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-creditcore-credit-attach-20180911175-2 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-46 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-creditcore-credit-acct-20180911148-2 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-creditcore-repay-plan-detail-201809112057-1 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-creditcore-credit-apply-20180911178-2 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-CREDITCORE-LOAN-TRANS-20180791126-3 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-PFINMONEY-REPAYMENTTRIALTXN-20180627138-0 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-25 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,074] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-FFF-fcp-cs-201907181035-0 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,075] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-PFINMONEYMONITOR-REPAYMENT-201806271142-1 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,075] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition FCP_DATA_ONEPONE20180607-1 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,075] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-49 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,075] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-PFINMONEYMONITOR-REPAYMENTPLAN-201806271154-0 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,075] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition fcp-FFF-themis-cost-201911121623-0 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,075] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-16 from OnlinePartition to OnlinePartition (state.change.logger) kafka_2.12-2.4.1/logs/state-change.log.2020-11-28-01:[2020-11-28 01:51:02,075] ERROR [Controller id=0 epoch=20] Controller 0 epoch 20 failed to change state for partition __consumer_offsets-28 from OnlinePartition to OnlinePartition (state.change.logger) {code} {code:java} // controller info fka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_offsets-22 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-FFF-account-201807131719-2 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-PCP-INSTRANSACTIONPOLICY-2018079116-0 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition LOAN_FAIL_MANAGE-202011231831270534-1 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-FFF-LOANTXNSUB-201806271129-0 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_offsets-4 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_command_request-5 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-creditcore-loan-trans-201809112022-2 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-CREDITCORE-LOAN-TRANS-20180791126-0 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_offsets-7 (kafka.controller.KafkaController) {code} was (Author: zhangzs): using kafka_2.12-2.4.1,zookeeper-3.5.7 3 ZKs 3 Broker cluster, topic replication factor is 2 linux (redhat) xfs kafka logs on single local disk error info {code:java} // controller info fka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_offsets-22 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-FFF-account-201807131719-2 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-PCP-INSTRANSACTIONPOLICY-2018079116-0 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition LOAN_FAIL_MANAGE-202011231831270534-1 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-FFF-LOANTXNSUB-201806271129-0 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_offsets-4 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_command_request-5 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,078] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-creditcore-loan-trans-201809112022-2 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition fcp-CREDITCORE-LOAN-TRANS-20180791126-0 (kafka.controller.KafkaController) kafka_2.12-2.4.1/logs/controller.log.2020-11-28-01:[2020-11-28 01:51:02,079] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition __consumer_offsets-7 (kafka.controller.KafkaController) {code} > updateIsr should stop after failed several times due to zkVersion issue > ----------------------------------------------------------------------- > > Key: KAFKA-3042 > URL: https://issues.apache.org/jira/browse/KAFKA-3042 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.10.0.0, 2.4.1 > Environment: jdk 1.7 > centos 6.4 > Reporter: Jiahongchao > Assignee: Dong Lin > Priority: Critical > Labels: reliability > Attachments: controller.log, server.log.2016-03-23-01, > state-change.log > > > sometimes one broker may repeatly log > "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR" > I think this is because the broker consider itself as the leader in fact it's > a follower. > So after several failed tries, it need to find out who is the leader -- This message was sent by Atlassian Jira (v8.3.4#803005)