Please read the following: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka?
https://kafka.apache.org/0101/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html (usage Examples). It describes all the use cases. If you store offsets in ZK, you will not achieve exactly once. On 22 December 2016 at 1:13:41 pm, kant kodali (kanth...@gmail.com) wrote: Hi Hans, Thats a great answer compared to the paragraphs I read online! I am assuming you meant HDFS? what is JSDC ? Any idea on which is more common for this kind of use case? Also can I store offsets to zookeeper using ZAB instead of using external store? I am not sure how zookeeper stores data but I keep reading you can (perhaps zookeeper requires external storage?). Thanks! On Wed, Dec 21, 2016 at 5:11 PM, Hans Jespersen <h...@confluent.io> wrote: > Exactly once Kafka Sink Connectors typically store the offset externally > in the same atomic write as they store the messages. That way after a > crash, they can check the external store (HSFS, JSDC, etc) retrieve the > last committed offset and seek the the next message and continue processing > with no duplicates and exactly once semantics. > > -hans > > > > > > On Dec 21, 2016, at 4:39 PM, kant kodali <kanth...@gmail.com> wrote: > > > > How does Kafka emulate exactly once processing currently? Does it require > > the producer to send at least once and consumer to de dupe? > > > > I did do my research but I feel like I am going all over the place so a > > simple short answer would be great! > > > > Thanks! > >