cmccabe opened a new pull request, #12595: URL: https://github.com/apache/kafka/pull/12595
Originally, the QuorumController did not try to limit the number of records in a batch that it sent to the Raft layer. This caused two problems. Firstly, we were not correctly handling the exception that was thrown by the Raft layer when a batch of records was too large to apply atomically. This happened because the Raft layer threw an exception which was a subclass of ApiException. Secondly, by letting the Raft layer split non-atomic batches, we were not able to create snapshots at each of the splits. This led to O(N) behavior during controller failovers. This PR fixes both of these issues by limiting the number of records in a batch. Atomic batches that are too large will fail with a RuntimeException which will cause the active controller to become inactive and revert to the last committed state. Non-atomic batches will be split into multiple batches with a fixed number of records in each. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org