kevin-wu24 commented on code in PR #18987:
URL: https://github.com/apache/kafka/pull/18987#discussion_r1971722138


##########
raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java:
##########
@@ -1127,14 +1331,75 @@ private MajorityReachedHighWatermark(Cluster cluster) {
 
         @Override
         public void verify() {
-            cluster.leaderHighWatermark().ifPresent(highWatermark -> {
-                long numReachedHighWatermark = 
cluster.nodes.entrySet().stream()
-                    .filter(entry -> 
cluster.voters.containsKey(entry.getKey()))
-                    .filter(entry -> entry.getValue().log.endOffset().offset() 
>= highWatermark)
-                    .count();
-                assertTrue(
-                    numReachedHighWatermark >= cluster.majoritySize(),
-                    "Insufficient nodes have reached current high watermark");
+            if (cluster.withKip853) {
+                /*
+                * For clusters running in KIP-853 mode, we check that a 
majority of at least one of:
+                * 1. the leader's voter set at the HWM

Review Comment:
   Sorry, upon further looking into this, I think the issue is just slightly 
different that what I've described thus far.
   
   Essentially, when we fail the invariant check when only using 
`lastVoterSet()`, what's going on is that the caller of `verify()` is a 
different thread (lets say event scheduler thread) than the KRaft leader's 
raft-io thread which is continuously doing all of this internal state updating. 
The event scheduler thread is looking at a bunch of the leader's internal state 
(e.g. partitionState and highWatermark), which can definitely be in the 
following state: `partitionState` has been updated with a new voter set, but a 
new `highWatermark` value has not yet been calculated with this voter set yet, 
which could cause this invariant check to fail when only looking at 
`lastVoterSet()`. The leader may not be in this state for very long, since it's 
in this state in between the calls to `appendAsLeader -> updateState -> 
maybeLoadLog` and `flushLeaderLog -> maybeUpdateHighWatermark`, but it 
nevertheless looks like a valid state to me.
   
   Invariants as a concept are predicates that should be true for the system at 
all times, no matter when you check them, so I think performing invariant 
verification how it's currently implemented is fine, as we could be checking 
the internal states of our raft nodes at any point in their execution. However, 
it just means we have to consider these "intermediary" states when performing 
verification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to