Hello Victor,
thanks for reaching out. I am not sure if it would be easily possible to
implement what you propose. The problem is really, that records might be
partially processed when an error happens, and before we can figure out
what the correct offset for committing is, KS needs to flush, but after
we hit an error, flushing cleanly might not be possible any longer.
Beside the above issue, w/o EOS (ie, Kafka transactions), there is all
kind of other scenarios that could lead to duplicate output, or
reprocessing an input record a second time. So even if we can implement
what you propose (we would need to investigate in more details if it
might be possible or not), it could only reduce duplicates, but not
fully eliminate them. Just want to make sure you are aware of this.
So if you are really interested to contribute such a feature, and you
can figure out how to do this correctly, yes, please write a KIP about
it :) -- We can also try to help you with this investigation (it might
require some POC PR...); however, not sure how much time we can find atm
to support you TBH.
However, I am wondering what kind of performance you need, and why EOS
would not be able to deliver. There is many configs, so maybe there is a
way to tune your app to make it work with EOS?
-Matthias
On 9/3/25 10:50 AM, Victor Osorio wrote:
Hello everyone,
We’re currently using Kafka Streams to process transactional data with
*exactly-once semantics (EOS)*. However, for some of our workloads, we
require higher throughput, which makes EOS impractical.
To ensure data integrity, we rely
on UncaughtExceptionHandler and ProductionExceptionHandler to halt
stream processing upon any exception. This prevents data loss but
introduces a new challenge: when a thread stops due to an exception, it
doesn’t commit the records that were already successfully processed. As
a result, when the stream restarts, those records are reprocessed,
leading to duplication.
While reviewing the discussion around KIP-1033, I noticed the suggestion
to avoid exposing commit functionality in the Kafka Streams API
(https://lists.apache.org/thread/k4v0737tqjdnq5vl3yp9rjr4qzqoo306
<https://lists.apache.org/thread/k4v0737tqjdnq5vl3yp9rjr4qzqoo306>).
That makes sense in many contexts, but I’d like to revisit a related idea:
*Could we introduce a new shutdown mechanism, perhaps a “Graceful
Shutdown” API, that commits all successfully processed records while
skipping the one that caused the failure?*
This would allow us to maintain data integrity without sacrificing
throughput or introducing duplicates. I’m curious to hear your thoughts:
* Would this be possible to implement with current Kafka Streams APIs?
* Is that possible, or desired, to be added as a Kafka Streams feature
in further releases? If yes, we can open a KIP.
Looking forward to your insights and feedback.
Best regards,
Victor Osório
amdocs-2017-brand-mark-rgb
*This email and the information contained herein is proprietary and
confidential and subject to the Amdocs Email Terms of Service, which you
may review at**https://www.amdocs.com/about/email-terms-of-service*
<https://www.amdocs.com/about/email-terms-of-service>