Andrew, In SDC (https://github.com/streamsets/datacollector <https://mailtrack.io/trace/link/0092e25372b4cf98c5d52857aab6990eba67700c?url=https%3A%2F%2Fgithub.com%2Fstreamsets%2Fdatacollector&signature=449d63b767116a1a>) we do the kind of offset management you mention to achieve this type of behavior (ideally--exactly once processing) but we still only give the user the choice of "at least once" and "at most once" because even when handling offsets this way you can still have an application failure and have a (very small) possibility of a duplicate if the offset wasnt committed due to, as an example, some transient error.
Specifically you can check out https://github.com/streamsets/datacollector/blob/9828e4ba5b90614316506c95784f43c471edc222/sdc-kafka_0_9/src/main/java/com/streamsets/pipeline/kafka/impl/KafkaConsumer09.java#L142-L173 We explicitly commit the offsets only once they've completed processing through the rest of the data pipeline. Hope this helps! -Adam On Fri, Feb 19, 2016 at 1:49 PM, Andrew Schofield <andrew_schofi...@live.com > wrote: > When publishing messages to Kafka, you make a choice between at-most-once > and at-least-once delivery, depending on whether you wait for > acknowledgments and whether you retry on failures. In most cases, those > options are good enough. However, some systems offer exactly-once > reliability too. Although my view is that the practical use of exactly-once > is limited in the situations that Kafka is generally used for, when you're > connecting other systems to Kafka or bridging between protocols, I think > there is value in propagating the reliability level that the other system > expects. > > As a consumer, you can manage your offset and get exactly-once delivery, > or more likely exactly-once processing, of the messages. > > I've read about idempotent producers ( > https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer) > and I know there's been some discussion about transactions too. > > Is there a plan to provide the tools to enable exactly-once publication > behaviour? Is this a planned enhancement to Kafka Connect? Is there already > some technique that people are using effectively to get exactly-once? > > Andrew Schofield -- Adam Kunicki StreamSets | Field Engineer mobile: 415.890.DATA (3282) | linkedin <https://mailtrack.io/trace/link/1a10fd5d6ef1b52ce525279a1b43102d913f7de5?url=http%3A%2F%2Fwww.adamkunicki.com&signature=d61f8b48a0c4f804>