Hi, looking for some beginner's help with architectural choices. Scenario 1:
A stream of messages arrives on a Kafka topic. Each has the key being a widget ID, and the value being an indication of whether the widget is connected or disconnected. For each widget a message arrives every 30 seconds with the current connected state. Requirement: emit a message when a widget has been reported as disconnected for at least five minutes. Constraints: there can be large numbers of widgets, so horizontal scalability is required. The solution must be robust against the application, or instances of it, crashing, so persistence or regeneration of any local/internal state is required, and committing the input messages has to be got right so that they're reprocessed on a restart if and as necessary. Observation: without the constraints we'd just read each message, keep an in-memory list of disconnected widgets with the timestamp at which they went disconnected, then when another "disconnected" status comes in emit the output message if this widget has now been disconnected for more than five minutes. It looks like there's stuff in Kafka Streams that can help here, but I can't see quite where to start? Scenario 2: A stream of messages arrives on a Kafka topic. Each has the key being a widget ID, and the value being an indication of whether the widget is connected or disconnected. For each widget a message arrives only when the connected state changes. Requirement and constraints: as above Observation: without the constraints we'd just read each message, keep an in-memory list of widgets whose most recent message indicated a disconnection, and for each of these start a five minute timer, and if the timer went off before a "reconnected" message was received emit the output message. I'm less clear that there's anything out of the box in Kafka Streams that can do this sort of timeout of non-existent messages? - but I don't understand from the documentation how the various windowing features work. Note: the numbers are for the sake of illustration. I can imagine different solutions being appropriate depending on whether (a) the timeout (five minutes as quoted) is, whilst configurable, guaranteed to be the same for every widget, or (b) there is a requirement to configure the timeout separately for each widget. (Solutions involving updating a database table and then polling it on a cron job are really the sort of thing we're trying to get away from here.) Tim Ward The contents of this email and any attachment are confidential to the intended recipient(s). If you are not an intended recipient: (i) do not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact the sender immediately; and (iii) delete this email. Our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy Limited (company number 8619644); Origami Storage Limited (company number 10436515) and OSSPV001 Limited (company number 10933403), each registered in England and each with a registered office at: Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.