[ https://issues.apache.org/jira/browse/KUDU-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304439#comment-17304439 ]
Bankim Bhavsar commented on KUDU-3266: -------------------------------------- A=eb86b1c4913647fc927d576f744c3d27 B=3978845170a24b80a8903036a6e97382 C=00e14f42918b444bb2a05e9b4f2ac855 Log snippets: {noformat} I0317 17:04:14.694448 17546 raft_consensus.cc:479] T 00000000000000000000000000000000 P 00e14f42918b444bb2a05e9b4f2ac855 [term 4 FOLLOWER]: Starting leader election (detected failure of leader eb86b1c4913647fc927d576f744c3d27 I0317 17:04:18.775841 17549 sys_catalog.cc:437] T 00000000000000000000000000000000 P 3978845170a24b80a8903036a6e97382 [sys.catalog]: This master's current role is: LEADER I0317 17:04:18.776212 17553 sys_catalog.cc:437] T 00000000000000000000000000000000 P 00e14f42918b444bb2a05e9b4f2ac855 [sys.catalog]: This master's current role is: FOLLOWER {noformat} eb86b1c4913647fc927d576f744c3d27 coming back from pause thinks it's the leader. While 00e14f42918b444bb2a05e9b4f2ac855 is trying to become leader. {noformat} I0317 17:04:19.643288 14988 tablet_service.cc:1729] Received RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5 candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" is_pre_election: true I0317 17:04:19.644940 17503 consensus_queue.cc:571] T 00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [LEADER]: Leader has been unable to successfully communicate with peer 3978845170a24b80a8903036a6e97382 for more than 4 seconds (6.459s) I0317 17:04:19.645022 14987 tablet_service.cc:1729] Received RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5 candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" I0317 17:04:19.645114 17503 sys_catalog.cc:434] T 00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [sys.catalog]: SysCatalogTable state changed. Reason: Peer health change. Latest consensus state: current_term: 4 leader_uuid: "eb86b1c4913647fc927d576f744c3d27" committed_config { opid_index: 2852 OBSOLETE_local: false peers { permanent_uuid: "eb86b1c4913647fc927d576f744c3d27" member_type: VOTER last_known_addr { host: "127.0.92.253" port: 37459 } } peers { permanent_uuid: "3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host: "127.0.92.252" port: 45331 } } peers { permanent_uuid: "00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host: "127.0.92.254" port: 43853 } attrs { promote: false } } } I0317 17:04:19.645200 17503 sys_catalog.cc:437] T 00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [sys.catalog]: This master's current role is: LEADER I0317 17:04:19.645022 14987 tablet_service.cc:1729] Received RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5 candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" I0317 17:04:19.645114 17503 sys_catalog.cc:434] T 00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [sys.catalog]: SysCatalogTable state changed. Reason: Peer health change. Latest consensus state: current_term: 4 leader_uuid: "eb86b1c4913647fc927d576f744c3d27" committed_config { opid_index: 2852 OBSOLETE_local: false peers { permanent_uuid: "eb86b1c4913647fc927d576f744c3d27" member_type: VOTER last_known_addr { host: "127.0.92.253" port: 37459 } } peers { permanent_uuid: "3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host: "127.0.92.252" port: 45331 } } peers { permanent_uuid: "00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host: "127.0.92.254" port: 43853 } attrs { promote: false } } } /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/cluster_verifier.cc:119: Failure Failed Bad status: Not found: Unable to open table: the table does not exist: table_name: "table-1" /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:603: Failure Expected: cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0) doesn't generate new fatal failures in the current thread. Actual: it does. I0317 17:04:19.667603 371 external_mini_cluster.cc:1294] Killing /tmp/dist-test-task6JYMlq/build/debug/bin/kudu with pid 15089 I0317 17:04:19.673735 15061 raft_consensus.cc:1223] T 00000000000000000000000000000000 P 3978845170a24b80a8903036a6e97382 [term 6 LEADER]: Rejecting Update request from peer eb86b1c4913647fc927d576f744c3d27 for earlier term 4. Current term is 6. Ops: [] I0317 17:04:19.676132 14988 tablet_service.cc:1729] Received RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" candidate_uuid: "3978845170a24b80a8903036a6e97382" candidate_term: 6 candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" I0317 17:04:19.676213 14987 tablet_service.cc:1729] Received RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" candidate_uuid: "3978845170a24b80a8903036a6e97382" candidate_term: 6 candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" is_pre_election: true I0317 17:04:19.676512 14986 raft_consensus.cc:3027] T 00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [term 4 LEADER]: Stepping down as leader of term 4 I0317 17:04:19.676553 14986 raft_consensus.cc:726] T 00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [term 4 LEADER]: Becoming Follower/Learner. State: Replica: eb86b1c4913647fc927d576f744c3d27, State: Running, Role: LEADER I0317 17:04:19.676688 14986 consensus_queue.cc:257] T 00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated index: 0, Majority replicated index: 3513, Committed index: 3513, Last appended: 4.3513, Last appended by leader: 3513, Current term: 4, Majority size: -1, State: 0, Mode: NON_LEADER, active raft config: opid_index: 2852 OBSOLETE_local: false peers { permanent_uuid: "eb86b1c4913647fc927d576f744c3d27" member_type: VOTER last_known_addr { host: "127.0.92.253" port: 37459 } } peers { permanent_uuid: "3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host: "127.0.92.252" port: 45331 } } peers { permanent_uuid: "00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host: "127.0.92.254" port: 43853 } attrs { promote: false } } {noformat} > Flakiness in dynamic_multi_master_test in VerifyClusterAfterMasterAddition() > function > ------------------------------------------------------------------------------------- > > Key: KUDU-3266 > URL: https://issues.apache.org/jira/browse/KUDU-3266 > Project: Kudu > Issue Type: Test > Components: master, test > Affects Versions: 1.15.0 > Reporter: Bankim Bhavsar > Assignee: Bankim Bhavsar > Priority: Major > > {noformat} > ParameterizedRecoverMasterTest.TestRecoverDeadMasterSysCatalogCopy/1: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/cluster_verifier.cc:119: > Failure > Failed > Bad status: Not found: Unable to open table: the table does not exist: > table_name: "table-1" > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:603: > Failure > Expected: cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0) doesn't > generate new fatal failures in the current thread. > Actual: it does. > 2021-03-17T17:04:19Z chronyd exiting > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:1099: > Failure > Expected: VerifyClusterAfterMasterAddition(master_hps, orig_num_masters_) > doesn't generate new fatal failures in the current thread. > Actual: it does. > {noformat} > Although the same verification function is used by other tests for add > master, this flakiness started showing up after introduction of the > RecoverDeadMaster test. > https://github.com/apache/kudu/commit/4b4a8c0f2fdfd15524510821b27fc9c3b5d26b6b -- This message was sent by Atlassian Jira (v8.3.4#803005)