Bruno, Thanks for the info! that makes sense. Of course now I have more questions. :) Do you know why this is being done outside of the state store API? I assume there are reasons why a "deleteAll()" type of function wouldn't work, thereby allowing a state store to purge itself? And maybe more importantly, is there a way to achieve a similar behavior with a 3rd party store? I'm not sure if hooking into the uncaught exception handler might be a good way to purge/drop a state store in the event of a fatal error? I did setup a MongoDB state store recently as part of a POC and was testing it with EOS enabled. (forcing crashes to occur and checking that the result of my aggregation was still accurate) I was unable to cause inconsistent data in the mongo store (which is good!), though of course I may just have been getting lucky. Thanks again for your help,
Alex On Mon, Mar 15, 2021 at 8:59 AM Pushkar Deole <pdeole2...@gmail.com> wrote: > Bruno, > > i tried to explain this in 'kafka user's language through above mentioned > scenario, hope i put it properly -:) and correct me if i am wrong > > On Mon, Mar 15, 2021 at 7:23 PM Pushkar Deole <pdeole2...@gmail.com> > wrote: > > > This is what I understand could be the issue with external state store: > > > > kafka stream application consumes source topic, does processing, stores > > state to kafka state store (this is backed by topic) and before producing > > event on destination topic, the application fails with some issue. In > > this case, the transaction has failed, so kafka guarantees either all or > > none, means offset written to source topic, state written to state store > > topic, output produced on destination topic... all of these happen or > none > > of these and in this failure scenario it is none of these. > > > > Assume you have redis state store, and you updated the state into redis > > and stream application failed. Now, you have source topic and destination > > topic consistent i.e. offset is not committed to source topic and output > > not produced on destination topic, but you redis state store is > > inconsistent with that since it is external state store and kafka can't > > guarantee rollback ot state written there > > > > On Mon, Mar 15, 2021 at 6:30 PM Alex Craig <alexcrai...@gmail.com> > wrote: > > > >> " Another issue with 3rd party state stores could be violation of > >> exactly-once guarantee provided by kafka streams in the event of a > failure > >> of streams application instance" > >> > >> I've heard this before but would love to know more about how a custom > >> state > >> store would be at any greater risk than RocksDB as far as exactly-once > >> guarantees are concerned. They all implement the same interface, so as > >> long as you're correctly implementing get(), put(), delete(), flush(), > >> etc, > >> you should be fine right? In other words, I don't think there is any > >> special "exactly once magic" that is baked into the RocksDB store > code. I > >> could be wrong though so I'd love to hear people's thoughts, thanks, > >> > >> Alex C > >> > >> On Sun, Mar 14, 2021 at 4:58 PM Parthasarathy, Mohan <mpart...@hpe.com> > >> wrote: > >> > >> > Thanks for the responses. In the worst case, I might have to keep both > >> > rocksdb for local store and keep an external store like Redis. > >> > > >> > -mohan > >> > > >> > > >> > On 3/13/21, 8:53 PM, "Pushkar Deole" <pdeole2...@gmail.com> wrote: > >> > > >> > Another issue with 3rd party state stores could be violation of > >> > exactly-once guarantee provided by kafka streams in the event of a > >> > failure > >> > of streams application instance. > >> > Kafka provides exactly once guarantee for consumer -> process -> > >> > produce > >> > through transactions and with the use of state store like redis, > >> this > >> > guarantee is weaker > >> > > >> > On Sat, Mar 13, 2021 at 5:28 AM Guozhang Wang <wangg...@gmail.com > > > >> > wrote: > >> > > >> > > Hello Mohan, > >> > > > >> > > I think what you had in mind works with Redis, since it is a > >> remote > >> > state > >> > > store engine, it does not have the co-partitioning requirements > as > >> > local > >> > > state stores. > >> > > > >> > > One thing you'd need to tune KS though is that with remote > stores, > >> > the > >> > > processing latency may be larger, and since Kafka Streams > process > >> all > >> > > records of a single partition in order, synchronously, you may > >> need > >> > to tune > >> > > the poll interval configs etc to make sure KS would stay in the > >> > consumer > >> > > group and not trigger unnecessary rebalances. > >> > > > >> > > Guozhang > >> > > > >> > > On Thu, Mar 11, 2021 at 6:41 PM Parthasarathy, Mohan < > >> > mpart...@hpe.com> > >> > > wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > I have a use case where messages come in with some key gets > >> > assigned some > >> > > > partition and the state gets created. Later, key changes (but > >> still > >> > > > contains the old key in the message) and gets sent to a > >> different > >> > > > partition. I want to be able to grab the old state using the > old > >> > key > >> > > before > >> > > > creating the new state on this instance. Redis as a state > store > >> > makes it > >> > > > easy to implement this where I can simply do a lookup before > >> > creating the > >> > > > state. I see an implementation here : > >> > > > > >> > > > >> > > >> > https://github.com/andreas-schroeder/redisks/tree/master/src/main/java/com/github/andreas_schroeder/redisks > >> > > > > >> > > > Has anyone tried this ? Any caveats. > >> > > > > >> > > > Thanks > >> > > > Mohan > >> > > > > >> > > > > >> > > > >> > > -- > >> > > -- Guozhang > >> > > > >> > > >> > > >> > > >> > > >