Hi Tim, The general approach used by Streams is resilience by wrapping all state updates in a "changelog topic". That is, when Streams updates a key/value pair in the state store, it also sends the update to a special topic associated with that store. The record is only considered "committed" aka "fully processed" after the write to the changelog topic is acknowledged (among other writes).
Upon start-up (or restart), we'll attempt to use any on-disk state stores we find, but if they are missing or corrupted, we'll just rebuild the whole state store from the changelog topic. So the durability is actually provided by the kafka brokers, and using on-disk state stores is an optimization where available. Thus, you should be fine if you completely lose the disk at any phase of processing (before or after a disconnect). However, depending on the size of your state store, you may find that recovery from the changelog is on the slow side. For this reason, it's probably a good idea to use stateful sets with K8s if possible. Does this help? Thanks, -John On Wed, Sep 12, 2018 at 7:50 AM Tim Ward <tim.w...@origamienergy.com> wrote: > From: John Roesler <j...@confluent.io> > > As you noticed, a windowed computation won't work here, because you would > > be wanting to alert on things that are absent from the window. > > > Instead, you can use a custom Processor with a Key/Value store and > schedule > > punctuations to send the alerts. For example, you can store the state and > > the time of the transition to that state, and finally whether an alert > has > > been sent for that widget. You can schedule a punctuation to scan over > the > > whole store. For any record that's been disconnected for more than 5m and > > *not* sent an alert, you send the alert and set the "sent" flag. Since > you > > only need to consider widgets whose last transition was a disconnect and > > that have *not* had an alert sent, you can keep the store pretty compact > by > > dropping entries when you send the alert or when they transition from > > "disconnected to connected". So the store doesn't need to contain any > > widget whose state is currently "connected" or who is disconnected and > has > > already been alerted. > > Ta, I'll give something like that a try (the other scenario is simpler so > I'll do the harder one first). > > One question: how does resilience work? If for example the application > crashes after receiving a "disconnected" message but before timing it out? > Does the preservation of the local data store across application restarts > just sort all this out for me automagically so I don't have to worry about > it? (I'll be deploying to Kubernetes, and applications going away and > restarting at random seems to be a fact of life there.) > > Tim Ward > The contents of this email and any attachment are confidential to the > intended recipient(s). If you are not an intended recipient: (i) do not > use, disclose, distribute, copy or publish this email or its contents; (ii) > please contact the sender immediately; and (iii) delete this email. Our > privacy policy is available here: > https://origamienergy.com/privacy-policy/. Origami Energy Limited > (company number 8619644); Origami Storage Limited (company number 10436515) > and OSSPV001 Limited (company number 10933403), each registered in England > and each with a registered office at: Ashcombe Court, Woolsack Way, > Godalming, GU7 1LQ. >