Hi Kafka Users,
I'm thinking through how to convert my application to use Kafka. I use an
event sourcing model and something I do frequently is reprocess old events
when I change a model schema or update my processing code.

In my current setup, I have few enough events that I can just load all the
event types that feed into a model and sort them all and then reprocess
them. There's starting to be enough events though now that loading/sorting
events in memory is getting slow and sometimes causing OOM crashes.

So one very attractive thing about Kafka is that all events are sorted so
in theory, I just need to set a consumer's offset to 0 and things will just
work™. But I've read that each event should have its own topic which raises
the question how do I reprocess a model that's pulling from multiple topics
while maintaining the order of events across multiple topics.

So for the User model, say I have two events, userCreated and userUpdated
each with a timestamp and an entity_id pointing to the user. If I'm
reprocessing these, is there a normal pattern for how to pull events in
order from multiple topics?

One solution I've thought of is for producers to publish events to both
event-specific topics as well as model topics e.g. userCreated would get
published to the "userCreated" topic as well as the "user" topic.

Another is that the stream processor for User, when reprocessing, would
just look at the next event from each topic it's pulling from and always
pull the oldest one next. Slightly tricky code but doable.

Thoughts?

Reply via email to