Hello all, I have a question about Kafka Streams, which I’m evaluating
for a new project. (I know it’s still a work in progress but it might
work anyway for this project.)

I’m new to Kafka in particular and distributed systems in general so
please forgive me I’m confused about any of these concepts.

>From reading the docs on the new consumer API, I have the impression
that letting the consumer auto-commit is roughly akin to at-most-once
delivery, because a commit could occur past a record that wasn’t
actually processed. So in order to achieve at-least-once delivery, one
needs to employ “manual offset control” and explicitly commit after
processing has succeeded.

If I’ve got that right, then that leads me to my question about KStreams.

>From looking at the word count examples, it seems pretty clear that
using the “lower level” approach demonstrated in WordCountProcessorJob
— wherein a TopologyBuilder is created and supplied with a
ProcessorSupplier that supplies a Processor that receives a
ProcessorContext and calls commit on that ProcessorContext once
processing succeeds — enables the at-least-once delivery model. OK,
cool.

Looking at WordCountJob, however, which uses KStreams, I don’t see any
committing happening there explicitly, and in fact I searched the
entire kstream source tree (for the kstream package and its internals
sub-package) and I don’t see any calls to commit there. So my
_impression_ is that maybe any KStream topology can use only
auto-commit, and therefore only at-most-once processing.

Basically I’m wondering if my impression is correct, or not, or if I’m
just totally misunderstanding the code in its current state.

Thanks!
Avi

Reply via email to