Hi, I've got a question or more of a confirmation, and I hope this is the right place to post (please point me elsewhere if not). My question is around replaying events in a topology to rebuild the topology state (KTables and stores etc are what is meant by state for the topology).
To try and give a bit more grounding I’ll use an example of a stock system. In this system I have stock and locations. You can move stock to other locations or get told by other systems where stock is. Simple case moving stock in the system: In the above example the Kafka topology consumes allocate commands, location events and item events. The topology builds up a KTable’s from the events to understand the current state of where the item is, in order to validate if the allocate command can proceed. The building up of this Ktable from events is a form of event sourcing to my knowledge (please correct if wrong). Now let’s say we have another condition for allocating stock. Only certain locations have a cold storage, items which require cold storage can only be put into locations that have cold storage. We have the information in the original event, but not cared about in the state built in the ktable. So that means we will need to recreate the state of the Ktable by re-ingesting the events and building up the new state. To do this we will need to do two things, stop consuming allocate commands and kick off the ./bin/kafka-streams-application-reset.sh script using the item topic and location topic as input topics. The reason to stop consuming the allocate commands is to ensure that the state is up to date before processing new commands (as this may give incorrect results). The reason for not setting the allocate command as an input topic is to ensure we do not reprocess previous commands. Does this sound correct? So that’s reprocessing in the command-driven world, the other side is the event-driven world. For this example, imagine we are ingesting a third party system about where items are located. But the information does not explicitly state the location, we need to work it out according to the information given. Let’s say the first bit of information is all items that require a cold store are located in a cold store that is linked to where the stock comes from. So if the stock was made in location X it will be stored in location X. One possible thing to do would be to listen for an item created event, in the topology we outlined above and allocate. But this now causes a problem when resetting the application topology. The issue is we are building the state from the item created events and allocating on them. This means that resetting the allocation topology using the item as a source will cause the allocations to run again. This re-running of the allocation based on the item created event will mean it overrides any command based allocation. Hope this makes sense, feel free to ask more if i’ve not explained this clearly enough. Key take away here is don't event-source and do event-driven on the same event in the same topology. So to solve the above I think you would need two topologies. One for creating aggregate of the item state and produce this as a state on a topic. Then the other for performing the allocation. This would mean the reset on the first would be a reset using the item and location topics as normal, and the reset on the other (the allocaiton topology) would be against the aggregate topic. Does this sound correct? I understand the above might not be too clear, I do tend to use the DDD terminology quite freely. Please feel free to ask more. I’m looking for confirmation on the above or corrections, both welcome. Thanks, Mark