[ 
https://issues.apache.org/jira/browse/KUDU-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304439#comment-17304439
 ] 

Bankim Bhavsar commented on KUDU-3266:
--------------------------------------

A=eb86b1c4913647fc927d576f744c3d27
B=3978845170a24b80a8903036a6e97382
C=00e14f42918b444bb2a05e9b4f2ac855

Log snippets:

{noformat}
I0317 17:04:14.694448 17546 raft_consensus.cc:479] T 
00000000000000000000000000000000 P 00e14f42918b444bb2a05e9b4f2ac855 [term 4 
FOLLOWER]: Starting leader election (detected failure of leader 
eb86b1c4913647fc927d576f744c3d27
I0317 17:04:18.775841 17549 sys_catalog.cc:437] T 
00000000000000000000000000000000 P 3978845170a24b80a8903036a6e97382 
[sys.catalog]: This master's current role is: LEADER
I0317 17:04:18.776212 17553 sys_catalog.cc:437] T 
00000000000000000000000000000000 P 00e14f42918b444bb2a05e9b4f2ac855 
[sys.catalog]: This master's current role is: FOLLOWER
{noformat}

eb86b1c4913647fc927d576f744c3d27 coming back from pause thinks it's the leader.
While 00e14f42918b444bb2a05e9b4f2ac855 is trying to become leader.

{noformat}
I0317 17:04:19.643288 14988 tablet_service.cc:1729] Received 
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" 
candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5 
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: 
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" is_pre_election: true
I0317 17:04:19.644940 17503 consensus_queue.cc:571] T 
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [LEADER]: 
Leader has been unable to successfully communicate with peer 
3978845170a24b80a8903036a6e97382 for more than 4 seconds (6.459s)
I0317 17:04:19.645022 14987 tablet_service.cc:1729] Received 
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" 
candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5 
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: 
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27"
I0317 17:04:19.645114 17503 sys_catalog.cc:434] T 
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 
[sys.catalog]: SysCatalogTable state changed. Reason: Peer health change. 
Latest consensus state: current_term: 4 leader_uuid: 
"eb86b1c4913647fc927d576f744c3d27" committed_config { opid_index: 2852 
OBSOLETE_local: false peers { permanent_uuid: 
"eb86b1c4913647fc927d576f744c3d27" member_type: VOTER last_known_addr { host: 
"127.0.92.253" port: 37459 } } peers { permanent_uuid: 
"3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host: 
"127.0.92.252" port: 45331 } } peers { permanent_uuid: 
"00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host: 
"127.0.92.254" port: 43853 } attrs { promote: false } } }
I0317 17:04:19.645200 17503 sys_catalog.cc:437] T 
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 
[sys.catalog]: This master's current role is: LEADER


I0317 17:04:19.645022 14987 tablet_service.cc:1729] Received 
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" 
candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5 
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: 
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27"

I0317 17:04:19.645114 17503 sys_catalog.cc:434] T 
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 
[sys.catalog]: SysCatalogTable state changed. Reason: Peer health change. 
Latest consensus state: current_term: 4 leader_uuid: 
"eb86b1c4913647fc927d576f744c3d27" committed_config { opid_index: 2852 
OBSOLETE_local: false peers { permanent_uuid: 
"eb86b1c4913647fc927d576f744c3d27" member_type: VOTER last_known_addr { host: 
"127.0.92.253" port: 37459 } } peers { permanent_uuid: 
"3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host: 
"127.0.92.252" port: 45331 } } peers { permanent_uuid: 
"00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host: 
"127.0.92.254" port: 43853 } attrs { promote: false } } }

/data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/cluster_verifier.cc:119:
 Failure
Failed                                                                          
                    
Bad status: Not found: Unable to open table: the table does not exist: 
table_name: "table-1"        
/data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:603:
 Failure
Expected: cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0) doesn't 
generate new fatal failures in the current thread.
  Actual: it does.                                                              
                    
I0317 17:04:19.667603   371 external_mini_cluster.cc:1294] Killing 
/tmp/dist-test-task6JYMlq/build/debug/bin/kudu with pid 15089
I0317 17:04:19.673735 15061 raft_consensus.cc:1223] T 
00000000000000000000000000000000 P 3978845170a24b80a8903036a6e97382 [term 6 
LEADER]: Rejecting Update request from peer eb86b1c4913647fc927d576f744c3d27 
for earlier term 4. Current term is 6. Ops: []
I0317 17:04:19.676132 14988 tablet_service.cc:1729] Received 
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" 
candidate_uuid: "3978845170a24b80a8903036a6e97382" candidate_term: 6 
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: 
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27"
I0317 17:04:19.676213 14987 tablet_service.cc:1729] Received 
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000" 
candidate_uuid: "3978845170a24b80a8903036a6e97382" candidate_term: 6 
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader: 
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" is_pre_election: true
I0317 17:04:19.676512 14986 raft_consensus.cc:3027] T 
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [term 4 
LEADER]: Stepping down as leader of term 4
I0317 17:04:19.676553 14986 raft_consensus.cc:726] T 
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [term 4 
LEADER]: Becoming Follower/Learner. State: Replica: 
eb86b1c4913647fc927d576f744c3d27, State: Running, Role: LEADER
I0317 17:04:19.676688 14986 consensus_queue.cc:257] T 
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 
[NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated index: 0, 
Majority replicated index: 3513, Committed index: 3513, Last appended: 4.3513, 
Last appended by leader: 3513, Current term: 4, Majority size: -1, State: 0, 
Mode: NON_LEADER, active raft config: opid_index: 2852 OBSOLETE_local: false 
peers { permanent_uuid: "eb86b1c4913647fc927d576f744c3d27" member_type: VOTER 
last_known_addr { host: "127.0.92.253" port: 37459 } } peers { permanent_uuid: 
"3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host: 
"127.0.92.252" port: 45331 } } peers { permanent_uuid: 
"00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host: 
"127.0.92.254" port: 43853 } attrs { promote: false } }

{noformat}

> Flakiness in dynamic_multi_master_test in VerifyClusterAfterMasterAddition() 
> function
> -------------------------------------------------------------------------------------
>
>                 Key: KUDU-3266
>                 URL: https://issues.apache.org/jira/browse/KUDU-3266
>             Project: Kudu
>          Issue Type: Test
>          Components: master, test
>    Affects Versions: 1.15.0
>            Reporter: Bankim Bhavsar
>            Assignee: Bankim Bhavsar
>            Priority: Major
>
> {noformat}
> ParameterizedRecoverMasterTest.TestRecoverDeadMasterSysCatalogCopy/1: 
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/cluster_verifier.cc:119:
>  Failure
> Failed
> Bad status: Not found: Unable to open table: the table does not exist: 
> table_name: "table-1"
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:603:
>  Failure
> Expected: cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0) doesn't 
> generate new fatal failures in the current thread.
>   Actual: it does.
> 2021-03-17T17:04:19Z chronyd exiting
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:1099:
>  Failure
> Expected: VerifyClusterAfterMasterAddition(master_hps, orig_num_masters_) 
> doesn't generate new fatal failures in the current thread.
>   Actual: it does.
> {noformat}
> Although the same verification function is used by other tests for add 
> master, this flakiness started showing up after introduction of the 
> RecoverDeadMaster test.
> https://github.com/apache/kudu/commit/4b4a8c0f2fdfd15524510821b27fc9c3b5d26b6b



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to