Hi, looking for some beginner's help with architectural choices.

Scenario 1:

A stream of messages arrives on a Kafka topic. Each has the key being a widget 
ID, and the value being an indication of whether the widget is connected or 
disconnected. For each widget a message arrives every 30 seconds with the 
current connected state.

Requirement: emit a message when a widget has been reported as disconnected for 
at least five minutes.

Constraints: there can be large numbers of widgets, so horizontal scalability 
is required. The solution must be robust against the application, or instances 
of it, crashing, so persistence or regeneration of any local/internal state is 
required, and committing the input messages has to be got right so that they're 
reprocessed on a restart if and as necessary.

Observation: without the constraints we'd just read each message, keep an 
in-memory list of disconnected widgets with the timestamp at which they went 
disconnected, then when another "disconnected" status comes in emit the output 
message if this widget has now been disconnected for more than five minutes.

It looks like there's stuff in Kafka Streams that can help here, but I can't 
see quite where to start?

Scenario 2:

A stream of messages arrives on a Kafka topic. Each has the key being a widget 
ID, and the value being an indication of whether the widget is connected or 
disconnected. For each widget a message arrives only when the connected state 
changes.

Requirement and constraints: as above

Observation: without the constraints we'd just read each message, keep an 
in-memory list of widgets whose most recent message indicated a disconnection, 
and for each of these start a five minute timer, and if the timer went off 
before a "reconnected" message was received emit the output message.

I'm less clear that there's anything out of the box in Kafka Streams that can 
do this sort of timeout of non-existent messages? - but I don't understand from 
the documentation how the various windowing features work.

Note: the numbers are for the sake of illustration. I can imagine different 
solutions being appropriate depending on whether (a) the timeout (five minutes 
as quoted) is, whilst configurable, guaranteed to be the same for every widget, 
or (b) there is a requirement to configure the timeout separately for each 
widget.

(Solutions involving updating a database table and then polling it on a cron 
job are really the sort of thing we're trying to get away from here.)

Tim Ward

The contents of this email and any attachment are confidential to the intended 
recipient(s). If you are not an intended recipient: (i) do not use, disclose, 
distribute, copy or publish this email or its contents; (ii) please contact the 
sender immediately; and (iii) delete this email. Our privacy policy is 
available here: https://origamienergy.com/privacy-policy/. Origami Energy 
Limited (company number 8619644); Origami Storage Limited (company number 
10436515) and OSSPV001 Limited (company number 10933403), each registered in 
England and each with a registered office at: Ashcombe Court, Woolsack Way, 
Godalming, GU7 1LQ.

Reply via email to