gharris1727 opened a new pull request, #15154: URL: https://github.com/apache/kafka/pull/15154
The single ProcessingContext in the RetryWithToleranceOperator is subject to a data race when interacted with across multiple threads. Specifically, when the main task thread is converting records, an asynchronous error (from producer or errant record reporter) can intervene, and cause a non-problematic record to be incorrectly dropped silently. To fix this, make ProcessingContext a per-record object which is passed between threads (task-main-thread -> producer io thread/task-main-thread -> task internal thread) but never accessed concurrently. This has a number of benefits: 1. No locking on the happy-path, where before every RWTO operation was synchronized (failures are still synchronized to collect the totalFailures & report errors) 2. Less mutability in the ProcessingContext which is easier to reason about 3. Easier to add parallelism to the transformation chain in the future This comes at the cost of additional memory pressure from allocating more ProcessingContext objects, which should be negligible compared to the ConnectRecords themselves, and comparable to the existing InternalSinkRecord. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org