[
https://issues.apache.org/jira/browse/KUDU-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935313#comment-16935313
]
HeLifu edited comment on KUDU-2943 at 9/23/19 8:51 AM:
-------------------------------------------------------
If we step down a leader tablet, the leader's term will be increased by 1 but
not persisted.
https://github.com/apache/kudu/blob/ee22ddcc734ab4947218c670d5cfddd61fe90fbb/src/kudu/consensus/raft_consensus.cc#L570
Then, after a successful election, one of the followers will be the new leader
and the term will be increased by 1 too.
The term is durable for the new leader, but not for the old one. This is the
root cause.
https://github.com/apache/kudu/blob/ee22ddcc734ab4947218c670d5cfddd61fe90fbb/src/kudu/consensus/raft_consensus.cc#L1138
So, the StepDown API is not safe.
// code placeholder
tablet: ac74b319ad54416685f8b9d9506e1d61
f42c56 c2c8be eea10e
| | |
| start election |
| WON |
| leader(1,0) |
(1,0) | (1,0)
| NO_OP(1,1) |
(1,1) | (1,1)
| Write some Rows(1,2) |
(1,2) | (1,2)
| **StepDown(1/2,2)[term 2 is not durable] |
start election(1/2,2) | |
| | start election(1/2,2)
WON |
| FAIL
leader(2,2)[term 2 is durable] |
| (2,2)[term 2 is durable]
NO_OP(2,3) |
| (2,2)[not receive NO_OP]
**StepDown(2/3,3)[term 3 is not durable]"Line 570" |
| start election(2/3,2)
| WON
| leader(3,2)[term 3 is durable]
| |
| NO_OP(3,3)
(3,3)[term 3 is not durable]"Line 1138" |
| alter schema(3,4)
(3,4)[term 3 is not durable] |
| |
| [restart masters]
| [restart tservers]
|
**Reboot tablet failed since term is 2 in consensus metadata, opid is (3,4) in
WAL
was (Author: helifu):
I think the term 3 for f42c56 is not durable. That means the StepDown API is
not safe.
{code:java}
// code placeholder
tablet: ac74b319ad54416685f8b9d9506e1d61
f42c56 c2c8be eea10e
| | |
| start election |
| | |
(1,0) leader(1,0) (1,0)
| | |
(1,1) NO_OP(1,1) (1,1)
| | |
(1,2) Write some Rows(1,2) (1,2)
| | |
| StepDown(2, 2) |
start election(2,2) | |
| | start election(2,2)
WIN |
| FAIL
leader(2,2)[term is durable] |
| |
NO_OP(2,3)[no sync] (2,2)[not receive NO_OP]
| |
****StepDown(3,3)[term is not durable]"Line 1489" |
| start election(3,2)
| |
| WIN
| |
| leader(3,2)
| |
| NO_OP(3,3)
| |
| alter schema(3,4)
(3,4)[term is not durable, op in WAL] |
| |
restart masters
restart tservers
{code}
> TsTabletManagerITest.TestTableStats flaky due to WAL/cmeta term disagreement
> ----------------------------------------------------------------------------
>
> Key: KUDU-2943
> URL: https://issues.apache.org/jira/browse/KUDU-2943
> Project: Kudu
> Issue Type: Bug
> Components: consensus, test
> Affects Versions: 1.11.0
> Reporter: Adar Dembo
> Priority: Critical
> Attachments: ts_tablet_manager-itest.txt
>
>
> This new test failed in a strange (and worrying) way:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc:753:
> Failure
> Failed
> Bad status: Corruption: Unable to start RaftConsensus: The last op in the WAL
> with id 3.4 has a term (3) that is greater than the latest recorded term,
> which is 2
> {noformat}
> From a brief dig through the code, looks like this means the current term as
> per the on-disk cmeta file is older than the term in the latest WAL op.
> I can believe that this is somehow due to InternalMiniCluster exercising
> clean shutdown paths that aren't well tested or robust, but it'd be nice to
> determine that with certainty.
> I've attached the full test log.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)