mumrah opened a new pull request, #12269:
URL: https://github.com/apache/kafka/pull/12269
This patch does two things to tighten up races during bootstrap. One is to
simply move the volatile write until after the "bootstrapMetadata" event has
been enqueued. The other thing is to prepend "bootstrapMetadata" to the queue.
From the JIRA, this test was flaky when we see the "registerBroker" event
precede the "boostrapMetadata" event
```
handleLeaderChange() start
appendWriteEvent(registerBroker)
appendWriteEvent(bootstrapMetadata)
handleLeaderChange() finish
registerBroker() -> writes broker registration to log
bootstrapMetadata() -> writes bootstrap metadata to log
```
The test harness in QuorumControllerTest is polling the controllers to see
when their current leader epoch matches the expected leader epoch
```java
QuorumController activeController() throws InterruptedException {
AtomicReference<QuorumController> value = new
AtomicReference<>(null);
TestUtils.retryOnExceptionWithTimeout(20000, 3, () -> {
LeaderAndEpoch leader = logEnv.leaderAndEpoch();
for (QuorumController controller : controllers) {
if
(OptionalInt.of(controller.nodeId()).equals(leader.leaderId()) &&
---> controller.curClaimEpoch() == leader.epoch()) { <---
value.set(controller);
break;
}
}
if (value.get() == null) {
throw new RuntimeException(String.format("Expected to see %s
as leader", leader));
}
});
return value.get();
}
```
Prior to this patch, we were setting the `curClaimEpoch` volatile before
scheduling the "bootstrapMetadata" event. This creates a race where the test
code can see the controller as active before the bootstrap metadata has been
enqueued. By moving the volatile write after the "bootstrapMetadata" event
enqueueing, we can eliminate this race.
In production, there is nothing stopping clients from making requests to a
controller that is in the process of starting up. Once the socket server begins
processing requests, we will begin seeing events on the QuorumController queue.
The check for active controller is done when events are processed, so we could
have events enqueued at any time. By prepending the "bootstrapMetadata" event,
we can ensure any events that see the controller as active will come after the
bootstrap metadata has been processed.
If we handle a "registerBroker" event before "bootstrapMetadata" is
enqueued, then we are sure to see `curClaimEpoch = -1`. If a "registerBroker"
is waiting in the queue while we finish bootstrapping, then we are sure to
enqueue "bootstrapMetadata" before it. This means "registerBroker" (and any
other event) is guaranteed to see the controller as inactive _or_ that it is
active and the bootstrap metadata has been written.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]