Re: [PR] KAFKA-16580: Enable dynamic quorum reconfiguration for raft simulation tests pt 1 [kafka]

via GitHub Tue, 04 Mar 2025 16:22:30 -0800


kevin-wu24 commented on code in PR #18987:
URL: https://github.com/apache/kafka/pull/18987#discussion_r1980465079



##########
raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java:
##########
@@ -297,18 +309,83 @@ void canMakeProgressIfMajorityIsReachable(
         scheduler.runUntil(() -> cluster.allReachedHighWatermark(2 * 
restoredLogEndOffset));
     }
 
+    @Property(tries = 100, afterFailure = AfterFailureMode.SAMPLE_ONLY)
+    void canAddVotersIfMajorityIsReachable(
+        @ForAll int seed,
+        @ForAll @IntRange(min = 1, max = 3) int numObservers
+    ) {
+        int numVoters = 5;
+        Random random = new Random(seed);
+        Cluster cluster = new Cluster(numVoters, numObservers, random, true);
+        MessageRouter router = new MessageRouter(cluster);
+        EventScheduler scheduler = schedulerWithDefaultInvariants(cluster);
+        Set<Integer> expectedVoterIds = new 
HashSet<>(cluster.initialVoters.keySet());
+
+        initializeClusterAndStartAppending(cluster, router, scheduler, 10);
+
+        // Partition the nodes into two sets. Nodes are reachable within each 
set,
+        // but the two sets cannot communicate with each other. We should be 
able
+        // to make progress even if an election is needed in the larger set.
+        int firstObserverId = numVoters;
+        router.filter(
+            0,
+            new DropOutboundRequestsTo(cluster.endpointsFromIds(Set.of(2, 3, 
4, firstObserverId)))
+        );
+        router.filter(
+            1,
+            new DropOutboundRequestsTo(cluster.endpointsFromIds(Set.of(2, 3, 
4, firstObserverId)))
+        );
+        router.filter(2, new 
DropOutboundRequestsTo(cluster.endpointsFromIds(Set.of(0, 1))));
+        router.filter(3, new 
DropOutboundRequestsTo(cluster.endpointsFromIds(Set.of(0, 1))));
+        router.filter(4, new 
DropOutboundRequestsTo(cluster.endpointsFromIds(Set.of(0, 1))));
+        router.filter(firstObserverId, new 
DropOutboundRequestsTo(cluster.endpointsFromIds(Set.of(0, 1))));
+
+        long partitionLogEndOffset = cluster.maxLogEndOffset();
+        scheduler.runUntil(() -> cluster.anyReachedHighWatermark(2 * 
partitionLogEndOffset));
+
+        long minorityHighWatermark = cluster.maxHighWatermarkReached(Set.of(0, 
1));
+        long majorityHighWatermark = cluster.maxHighWatermarkReached(Set.of(2, 
3, 4));
+
+        assertTrue(
+            majorityHighWatermark > minorityHighWatermark,
+            String.format(
+                "majorityHighWatermark = %s, minorityHighWatermark = %s",
+                majorityHighWatermark,
+                minorityHighWatermark
+            )
+        );
+
+        // Verify we can add a voter, since a majority is still reachable
+        addVoter(cluster, scheduler, firstObserverId, expectedVoterIds);
+
+        // Now restore the partition and verify everyone catches up
+        router.filter(0, new PermitAllTraffic());
+        router.filter(1, new PermitAllTraffic());
+        router.filter(2, new PermitAllTraffic());
+        router.filter(3, new PermitAllTraffic());
+        router.filter(4, new PermitAllTraffic());
+        router.filter(firstObserverId, new PermitAllTraffic());
+
+        long restoredLogEndOffset = cluster.maxLogEndOffset();
+        scheduler.runUntil(() -> cluster.allReachedHighWatermark(2 * 
restoredLogEndOffset));
+    }
+
     @Property(tries = 100, afterFailure = AfterFailureMode.SAMPLE_ONLY)
     void leadershipAssignedOnlyOnceWithNetworkPartitionIfThereExistsMajority(
         @ForAll int seed,
-        @ForAll @IntRange(min = 0, max = 3) int numObservers
+        @ForAll @IntRange(min = 0, max = 3) int numObservers,
+        @ForAll boolean withKip853
     ) {
         int numVoters = 5;
         Random random = new Random(seed);
-        Cluster cluster = new Cluster(numVoters, numObservers, random);
+        Cluster cluster = new Cluster(numVoters, numObservers, random, 
withKip853);
         MessageRouter router = new MessageRouter(cluster);
         EventScheduler scheduler = schedulerWithDefaultInvariants(cluster);
         scheduler.addInvariant(new StableLeadership(cluster));
 
+        // Start cluster
+        cluster.startAll();

Review Comment:
   I guess what we mean by "cluster starting" is not super clear. I think the 
intention of this test is to have a network partition exist before the KRaft 
protocol starts (i.e. nodes sending messages to each other and performing the 
initial leader election). This is what it still does now, since the partition 
is defined before calls to the following:
   ```
           schedulePolling(scheduler, cluster, 3, 5);
           scheduler.schedule(router::deliverAll, 0, 2, 1);
           scheduler.schedule(new SequentialAppendAction(cluster), 0, 2, 1);
           scheduler.runUntil(cluster::hasConsistentLeader);
   ```
   What `cluster.startAll()` does is just update the `Cluster`'s state and 
initialize all the stuff KRaft uses, but if no messages are being delivered and 
no polling is occurring, so I don't think we've actually "started" yet.
   
   I think the function naming is misleading, it should be something like 
`initializeNodes()`. What "starts" KRaft from my perspective are the lines 
above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-16580: Enable dynamic quorum reconfiguration for raft simulation tests pt 1 [kafka]

Reply via email to