Well, that is exactly what I mean by "it depends on the state store implementation".
For this case, you cannot get exactly-once. There are actually ideas to improve the implementation to support the case you describe, but there is no timeline for this change yet. Not even sure if there is already a Jira ticket... -Matthias On 1/6/21 2:32 AM, Pushkar Deole wrote: > The question is if we want to use state store of 3rd party, e.g. say Redis, > how can the store be consistent with rest of the system i.e. source and > destination topics... > > e.g. record is consumed from source, processed, state store updated with > some state, but before writing to destination there is failure > Now, in this case, with kafka state store, it will be wiped off the state > stored since the transaction failed. > > But with Redis, the state store is updated with the new state and there is > no way to revert back > > On Tue, Jan 5, 2021 at 11:11 PM Matthias J. Sax <mj...@apache.org> wrote: > >> It depends on the store implementation. Atm, EOS for state store is >> achieved by re-creating the state store in case of failure from the >> changelog topic. >> >> For RocksDB stores, we wipe out the local state directories and create a >> new empty RocksDB and for in-memory stores the content is "lost" anyway >> when state is migrated, and we reply the changelog into an empty store >> before processing resumes. >> >> >> -Matthias >> >> On 1/5/21 6:27 AM, Alex Craig wrote: >>> I don't think he's asking about data-loss, but rather data consistency. >>> (in the event of an exception or application crash, will EOS ensure that >>> the state store data is consistent) My understanding is that it DOES >> apply >>> to state stores as well, in the sense that a failure during processing >>> would mean that the commit wouldn't get flushed and therefore wouldn't >> get >>> double-counted once processing resumes and message is re-processed. >>> As far as using something other than RocksDB, I think as long as you are >>> implementing the state store API correctly you should be fine. I did a >> POC >>> recently using Mongo state-stores with EOS enabled and it worked >> correctly, >>> even when I intentionally introduced failures and crashes. >>> >>> -alex >>> >>> On Tue, Jan 5, 2021 at 1:09 AM Ning Zhang <ning2008w...@gmail.com> >> wrote: >>> >>>> If there is a "change-log" topic to back up the state store, then it may >>>> not lose data. >>>> >>>> Also, if the third party store is not "kafka community certified" (or >> not >>>> well-maintained), it may have chances to lose data (in different ways). >>>> >>>> >>>> >>>> On 2021/01/05 04:56:12, Pushkar Deole <pdeole2...@gmail.com> wrote: >>>>> In case we opt to choose some third party store instead of kafka's >> stores >>>>> for storing state (e.g. Redis cache or Ignite), then will we lose the >>>>> exactly-once guarantee provided by kafka and the state stores can be in >>>> an >>>>> inconsistent state ? >>>>> >>>>> On Sat, Jan 2, 2021 at 4:56 AM Ning Zhang <ning2008w...@gmail.com> >>>> wrote: >>>>> >>>>>> The physical store behind "state store" is change-log kafka topic. In >>>>>> Kafka stream, if something fails in the middle, the "state store" is >>>>>> restored back to the state before the event happens at the first step >> / >>>>>> beginning of the stream. >>>>>> >>>>>> >>>>>> >>>>>> On 2020/12/31 08:48:16, Pushkar Deole <pdeole2...@gmail.com> wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> We use Kafka streams and may need to use exactly-once configuration >>>> for >>>>>>> some of the use cases. Currently, the application uses either local >>>> or >>>>>>> global state store to store state. >>>>>>> So, the application will consume events from source kafka topic, >>>> process >>>>>>> the events, for state stores it will use either local or global state >>>>>> store >>>>>>> of kafka, then produce events onto the destination topic. >>>>>>> >>>>>>> Question i have is: in the case of exactly-once setting, kafka >>>> streams >>>>>>> guarantees that all actions happen or nothing happens. So, in this >>>> case, >>>>>>> any state stored on the local or global state store will also be >>>> counted >>>>>>> under 'all or nothing' guarantee e.g. if event is consumed and state >>>>>> store >>>>>>> is updated, however some issue occurs before event is produced on >>>>>>> destination topic then will state store be restored back to the state >>>>>>> before it was updated for this event? >>>>>>> >>>>>> >>>>> >>>> >>> >> >