Hello all, I have a question about Kafka Streams, which I’m evaluating for a new project. (I know it’s still a work in progress but it might work anyway for this project.)
I’m new to Kafka in particular and distributed systems in general so please forgive me I’m confused about any of these concepts. >From reading the docs on the new consumer API, I have the impression that letting the consumer auto-commit is roughly akin to at-most-once delivery, because a commit could occur past a record that wasn’t actually processed. So in order to achieve at-least-once delivery, one needs to employ “manual offset control” and explicitly commit after processing has succeeded. If I’ve got that right, then that leads me to my question about KStreams. >From looking at the word count examples, it seems pretty clear that using the “lower level” approach demonstrated in WordCountProcessorJob — wherein a TopologyBuilder is created and supplied with a ProcessorSupplier that supplies a Processor that receives a ProcessorContext and calls commit on that ProcessorContext once processing succeeds — enables the at-least-once delivery model. OK, cool. Looking at WordCountJob, however, which uses KStreams, I don’t see any committing happening there explicitly, and in fact I searched the entire kstream source tree (for the kstream package and its internals sub-package) and I don’t see any calls to commit there. So my _impression_ is that maybe any KStream topology can use only auto-commit, and therefore only at-most-once processing. Basically I’m wondering if my impression is correct, or not, or if I’m just totally misunderstanding the code in its current state. Thanks! Avi