[ https://issues.apache.org/jira/browse/KAFKA-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369935#comment-17369935 ]
HaiyuanZhao edited comment on KAFKA-12958 at 6/26/21, 6:25 PM: --------------------------------------------------------------- Hi, [~jagsancio] I added an invariant that notified leaders are never asked to load snapshots. However, the test case canRecoverAfterAllNodesKilled failed, this is easy to reproduce. and the case detail is followed. *New Invariant* Verification logic is as followed {code:java} // java private static class LeaderNeverLoadSnapshot implements Invariant { final Cluster cluster; int epoch = 0; OptionalInt leaderId = OptionalInt.empty(); private LeaderNeverLoadSnapshot(Cluster cluster) { this.cluster = cluster; } @Override public void verify() { for (RaftNode raftNode : cluster.running()) { if (raftNode.counter.isLeader()) { assertFalse(raftNode.counter.isHandleSnapshotCalled()); assertTrue(raftNode.counter.getHandleSnapshotCalls() == 0); } else { if (raftNode.counter.isHandleSnapshotCalled()) { assertTrue(raftNode.counter.getHandleSnapshotCalls() > 0); } else { assertTrue(raftNode.counter.getHandleSnapshotCalls() == 0); } } } } } {code} *Run Result* The ** handleSnapshot callstack is followed. This callstack indicated that the new leader may have a chance to catch up by loadingSnaphost if its listener is lagging. And the fireSnapshot call comes from KAFKA-12154, which revision is 6203bf8. I am not sure if this is expected. Could you please take a look? !image-2021-06-27-02-09-25-296.png! *!image-2021-06-27-02-15-23-760.png!* was (Author: zhaohaidao): Hi, [~jagsancio] I added an invariant that notified leaders are never asked to load snapshots. However, the test case canRecoverAfterAllNodesKilled failed, this is easy to reproduce. and the case detail is followed. *New Invariant* *Run Result* The ** handleSnapshot callstack is followed. This callstack indicated that the new leader may have a chance to catch up by loadingSnaphost if its listener is lagging. And the fireSnapshot call comes from KAFKA-12154, which revision is 6203bf8. I am not sure if this is expected. Could you please take a look? !image-2021-06-27-02-09-25-296.png! *!image-2021-06-27-02-15-23-760.png!* > Add simulation invariant for leadership and snapshot > ---------------------------------------------------- > > Key: KAFKA-12958 > URL: https://issues.apache.org/jira/browse/KAFKA-12958 > Project: Kafka > Issue Type: Sub-task > Reporter: Jose Armando Garcia Sancio > Assignee: HaiyuanZhao > Priority: Major > Attachments: image-2021-06-27-02-09-25-296.png, > image-2021-06-27-02-15-23-760.png > > > During the simulation we should add an invariant that notified leaders are > never asked to load snapshots. The state machine always sees the following > sequence of callback calls: > Leaders see: > ... > handleLeaderChange state machine is notify of leadership > handleSnapshot is never called > Non-leader see: > ... > handleLeaderChange state machine is notify that is not leader > handleSnapshot is called 0 or more times -- This message was sent by Atlassian Jira (v8.3.4#803005)