Hello everyone,

We're currently using Kafka Streams to process transactional data with 
exactly-once semantics (EOS). However, for some of our workloads, we require 
higher throughput, which makes EOS impractical.

To ensure data integrity, we rely on UncaughtExceptionHandler and 
ProductionExceptionHandler to halt stream processing upon any exception. This 
prevents data loss but introduces a new challenge: when a thread stops due to 
an exception, it doesn't commit the records that were already successfully 
processed. As a result, when the stream restarts, those records are 
reprocessed, leading to duplication.

While reviewing the discussion around KIP-1033, I noticed the suggestion to 
avoid exposing commit functionality in the Kafka Streams API 
(https://lists.apache.org/thread/k4v0737tqjdnq5vl3yp9rjr4qzqoo306). That makes 
sense in many contexts, but I'd like to revisit a related idea:
Could we introduce a new shutdown mechanism, perhaps a "Graceful Shutdown" API, 
that commits all successfully processed records while skipping the one that 
caused the failure?

This would allow us to maintain data integrity without sacrificing throughput 
or introducing duplicates. I'm curious to hear your thoughts:

  *   Would this be possible to implement with current Kafka Streams APIs?
  *   Is that possible, or desired, to be added as a Kafka Streams feature in 
further releases? If yes, we can open a KIP.

Looking forward to your insights and feedback.

Best regards,
Victor Osório
[amdocs-2017-brand-mark-rgb]

This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service 
<https://www.amdocs.com/about/email-terms-of-service>

Reply via email to