Hi Divij,

I have limited experience tackling bugs in Kafka, but for the issues I've
fixed are usually found through..:
1. flaky unit/integration/system tests
2. Insufficient test coverage: When trying to add tests...
3. Related to #2 - Found through production at scale

I have heard Kraft recently introduced simulation testing and we might plan
to do that for client-side changes (KIP-848) because it removes network
communication from the testing and really test the integration.

Just some thoughts,
P

On Tue, Oct 24, 2023 at 2:32 AM Divij Vaidya <divijvaidy...@gmail.com>
wrote:

> Hey folks
>
> We recently came across a bug [1] which was very hard to detect during
> testing and easy to introduce during development. I would like to kick
> start a discussion on potential ways which could avoid this category of
> bugs in Apache Kafka.
>
> I think we might want to start working towards a "debug" mode in the broker
> which will enable assertions for different invariants in Kafka. Invariants
> could be derived from formal verification that Jack [2] and others have
> shared with the community earlier AND from tribal knowledge in the
> community such as network threads should not perform any storage IO, files
> should not fsync in critical product path, metric gauges should not acquire
> a lock etc. The release qualification  process (system tests + integration
> tests) will run the broker in "debug" mode and will validate these
> assertions while testing the system in different scenarios. The inspiration
> for this idea is derived from Marc Brooker's post at
> https://brooker.co.za/blog/2023/07/28/ds-testing.html
>
> Your thoughts on this topic are welcome! Also, please feel free to take
> this idea forward and draft a KIP for a more formal discussion.
>
> [1] https://issues.apache.org/jira/browse/KAFKA-15653
> [2] https://lists.apache.org/thread/pfrkk0yb394l5qp8h5mv9vwthx15084j
>
> --
> Divij Vaidya
>

Reply via email to