Hi Divij, I have limited experience tackling bugs in Kafka, but for the issues I've fixed are usually found through..: 1. flaky unit/integration/system tests 2. Insufficient test coverage: When trying to add tests... 3. Related to #2 - Found through production at scale
I have heard Kraft recently introduced simulation testing and we might plan to do that for client-side changes (KIP-848) because it removes network communication from the testing and really test the integration. Just some thoughts, P On Tue, Oct 24, 2023 at 2:32 AM Divij Vaidya <divijvaidy...@gmail.com> wrote: > Hey folks > > We recently came across a bug [1] which was very hard to detect during > testing and easy to introduce during development. I would like to kick > start a discussion on potential ways which could avoid this category of > bugs in Apache Kafka. > > I think we might want to start working towards a "debug" mode in the broker > which will enable assertions for different invariants in Kafka. Invariants > could be derived from formal verification that Jack [2] and others have > shared with the community earlier AND from tribal knowledge in the > community such as network threads should not perform any storage IO, files > should not fsync in critical product path, metric gauges should not acquire > a lock etc. The release qualification process (system tests + integration > tests) will run the broker in "debug" mode and will validate these > assertions while testing the system in different scenarios. The inspiration > for this idea is derived from Marc Brooker's post at > https://brooker.co.za/blog/2023/07/28/ds-testing.html > > Your thoughts on this topic are welcome! Also, please feel free to take > this idea forward and draft a KIP for a more formal discussion. > > [1] https://issues.apache.org/jira/browse/KAFKA-15653 > [2] https://lists.apache.org/thread/pfrkk0yb394l5qp8h5mv9vwthx15084j > > -- > Divij Vaidya >