If i understand correctly, Kafka is not designed to provide replicated caching mechanism wherein the updates to cache will be synchronous across multiple cache instances.
On Sun, May 3, 2020 at 10:49 PM Pushkar Deole <pdeole2...@gmail.com> wrote: > Thanks John. > > Actually, this is a normal consumer-producer application wherein there are > 2 consumers (admin consumer and main consumer) consuming messages from 2 > different topics. > One of the consumers consumes messages from a admin topic and populates > data in a cache e.g. lets say agent with agent id 10 for which the first > name and last name is received is populated in cache. When the other > consumer consumes message and it has agent id 10 then it reads the cache, > appends the first name and last name and then sends enriched event to > producer. > In this case, each application instance consumes all the events from admin > topic (unique consumer id) and keeps them in the cache in memory. > Now the requirement is to persist the cache and make is shared between the > application instances, so each instance would consume partitions of admin > topic and write to admin cache. > > If we want to use kafka streams, the application is so much evolved that > it is difficult to migrate to streams at this stage. Secondly, from past > mail chains, streams also won't serve the requirement since local state > stores would just hold the local state of admin data and the cache written > by each instance won't be shared with other instances. > > Global state stores may help but again it requires writing to the topic > which is then synced with the state stores in the instances and the > instances may not be in sync with each. > I am not sure if this would cause any inconsistencies since i don't know > how the events would flow from source e.g. if admin data is consumed by one > instance which then modified the topic but it is not yet synced to all the > global state stores and the next event arrived on the main consumer on a > different instance and it tried to read from store cache then it doesn't > get the data, so the event passed on without enriched data. > That's pretty much about the use case. > > > On Sun, May 3, 2020 at 9:42 PM John Roesler <vvcep...@apache.org> wrote: > >> Hi Pushkar, >> >> I’ve been wondering if we should add writable tables to the Streams api. >> Can you explain more about your use case and how it would integrate with >> your application? >> >> Incidentally, this would also help us provide more concrete advice. >> >> Thanks! >> John >> >> On Fri, May 1, 2020, at 15:28, Matthias J. Sax wrote: >> > Both stores sever a different purpose. >> > >> > Regular stores allow you to store state the application computes. >> > Writing into the changelog is a fault-tolerance mechanism. >> > >> > Global store hold "axially" data that is provided from "outside" of the >> > app. There is no changelog topic, but only the input topic (that is used >> > to re-create the global state). >> > >> > Local stores are sharded and updates are "sync" as they don't need to be >> > shared with anybody else. >> > >> > For global stores, as all instances need to be updated, updates are >> > async (we don't know when which instance will update it's own global >> > store replica). >> > >> > >> Say one stream thread updates the topic for global store and starts >> > >> processing next event wherein the processor tries to read the global >> store >> > >> which may not have been synced with the topic? >> > >> > Correct. There is no guarantee when the update to the global store will >> > be applied. As said, global stores are not designed to hold data the >> > application computes. >> > >> > >> > -Matthias >> > >> > >> > On 4/30/20 11:11 PM, Pushkar Deole wrote: >> > > thanks... will try with GlobalKTable. >> > > As a side question, I didn't really understand the significance of >> global >> > > state store which kind of works in a reverse way to local state store >> i.e. >> > > local state store is updated and then saved to changelog topic >> whereas in >> > > case of global state store the topic is updated first and then synced >> to >> > > global state store. Do these two work in sync i.e. the update to >> topic and >> > > global state store ? >> > > >> > > Say one stream thread updates the topic for global store and starts >> > > processing next event wherein the processor tries to read the global >> store >> > > which may not have been synced with the topic? >> > > >> > > On Fri, May 1, 2020 at 3:35 AM Matthias J. Sax <mj...@apache.org> >> wrote: >> > > >> > >> Yes. >> > >> >> > >> A `GlobalKTable` uses a global store internally. >> > >> >> > >> You can also use `StreamsBuilder.addGlobalStore()` or >> > >> `Topology.addGlobalStore()` to add a global store "manually". >> > >> >> > >> >> > >> -Matthias >> > >> >> > >> >> > >> On 4/30/20 7:42 AM, Pushkar Deole wrote: >> > >>> Thanks Matthias. >> > >>> Can you elaborate on the replicated caching layer part? >> > >>> When you say global stores, do you mean GlobalKTable created from a >> topic >> > >>> e.g. using StreamsBuilder.globalTable(String topic) method ? >> > >>> >> > >>> On Thu, Apr 30, 2020 at 12:44 PM Matthias J. Sax <mj...@apache.org> >> > >> wrote: >> > >>> >> > >>>> It's not possible to modify state store from "outside". >> > >>>> >> > >>>> If you want to build a "replicated caching layer", you could use >> global >> > >>>> stores and write into the corresponding topics to update all >> stores. Of >> > >>>> course, those updates would be async. >> > >>>> >> > >>>> >> > >>>> -Matthias >> > >>>> >> > >>>> On 4/29/20 10:52 PM, Pushkar Deole wrote: >> > >>>>> Hi All, >> > >>>>> >> > >>>>> I am wondering if this is possible: i have been asked to use state >> > >> stores >> > >>>>> as a general replicated cache among multiple instances of service >> > >>>> instances >> > >>>>> however the state store is created through streambuilder but is >> not >> > >>>>> actually modified through stream processor topology however it is >> to be >> > >>>>> modified from outside the stream topology. So, essentially, the >> state >> > >>>> store >> > >>>>> is just to be created from streambuilder and then to be used as an >> > >>>>> application level cache that will get replicated between >> application >> > >>>>> instances. Is this possible using state stores? >> > >>>>> >> > >>>>> Secondly, if possible, is this a good design approach? >> > >>>>> >> > >>>>> Appreciate your response since I don't know the internals of state >> > >>>> stores. >> > >>>>> >> > >>>> >> > >>>> >> > >>> >> > >> >> > >> >> > > >> > >> > >> > Attachments: >> > * signature.asc >> >