[ 
https://issues.apache.org/jira/browse/KAFKA-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369935#comment-17369935
 ] 

HaiyuanZhao edited comment on KAFKA-12958 at 6/26/21, 6:29 PM:
---------------------------------------------------------------

Hi, [~jagsancio] I added an invariant that notified leaders are never asked to 
load snapshots. However, the test case canRecoverAfterAllNodesKilled failed, 
this is easy to reproduce. and the case detail is followed.

*New Invariant*

Verification logic is as followed
{code:java}
// java
private static class LeaderNeverLoadSnapshot implements Invariant {
    final Cluster cluster;
    int epoch = 0;
    OptionalInt leaderId = OptionalInt.empty();

    private LeaderNeverLoadSnapshot(Cluster cluster) {
        this.cluster = cluster;
    }

    @Override
    public void verify() {
        for (RaftNode raftNode : cluster.running()) {
            if (raftNode.counter.isLeader()) {
                assertFalse(raftNode.counter.isHandleSnapshotCalled());
                assertTrue(raftNode.counter.getHandleSnapshotCalls() == 0);
            } else {
                if (raftNode.counter.isHandleSnapshotCalled()) {
                    assertTrue(raftNode.counter.getHandleSnapshotCalls() > 0);
                } else {
                    assertTrue(raftNode.counter.getHandleSnapshotCalls() == 0);
                }
            }

        }
    }
}
{code}
*Run Result*

The  handleSnapshot root caller and callstack are as followed. This callstack 
indicated that the new leader may have a chance to catch up by loadingSnaphost 
if its listener is lagging. And the fireSnapshot call comes from KAFKA-12154,  
which revision is 6203bf8.

I am not sure if this is expected. Could you please take a look?
{code:java}
// java
private void onUpdateLeaderHighWatermark(
    LeaderState<T> state,
    long currentTimeMs
) {
    state.highWatermark().ifPresent(highWatermark -> {
        ...

        // It is also possible that the high watermark is being updated
        // for the first time following the leader election, so we need
        // to give lagging listeners an opportunity to catch up as well
        updateListenersProgress(highWatermark.offset);
    });
}
{code}
!image-2021-06-27-02-27-41-966.png!

 


was (Author: zhaohaidao):
Hi, [~jagsancio] I added an invariant that notified leaders are never asked to 
load snapshots. However, the test case canRecoverAfterAllNodesKilled failed, 
this is easy to reproduce. and the case detail is followed.

*New Invariant*

Verification logic is as followed
{code:java}
// java
private static class LeaderNeverLoadSnapshot implements Invariant {
    final Cluster cluster;
    int epoch = 0;
    OptionalInt leaderId = OptionalInt.empty();

    private LeaderNeverLoadSnapshot(Cluster cluster) {
        this.cluster = cluster;
    }

    @Override
    public void verify() {
        for (RaftNode raftNode : cluster.running()) {
            if (raftNode.counter.isLeader()) {
                assertFalse(raftNode.counter.isHandleSnapshotCalled());
                assertTrue(raftNode.counter.getHandleSnapshotCalls() == 0);
            } else {
                if (raftNode.counter.isHandleSnapshotCalled()) {
                    assertTrue(raftNode.counter.getHandleSnapshotCalls() > 0);
                } else {
                    assertTrue(raftNode.counter.getHandleSnapshotCalls() == 0);
                }
            }

        }
    }
}
{code}
*Run Result*

The  handleSnapshot callstack is as followed. This callstack indicated that the 
new leader may have a chance to catch up by loadingSnaphost if its listener is 
lagging. And the fireSnapshot call comes from KAFKA-12154,  which revision is 
6203bf8.

I am not sure if this is expected. Could you please take a look?

!image-2021-06-27-02-27-41-966.png!

 

> Add simulation invariant for leadership and snapshot
> ----------------------------------------------------
>
>                 Key: KAFKA-12958
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12958
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Jose Armando Garcia Sancio
>            Assignee: HaiyuanZhao
>            Priority: Major
>         Attachments: image-2021-06-27-02-09-25-296.png, 
> image-2021-06-27-02-15-23-760.png, image-2021-06-27-02-26-48-368.png, 
> image-2021-06-27-02-27-41-966.png
>
>
> During the simulation we should add an invariant that notified leaders are 
> never asked to load snapshots. The state machine always sees the following 
> sequence of callback calls:
> Leaders see:
> ...
> handleLeaderChange state machine is notify of leadership
> handleSnapshot is never called
> Non-leader see:
> ...
> handleLeaderChange state machine is notify that is not leader
> handleSnapshot is called 0 or more times



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to