Thanks Ben! Nice to know I was on the right track :-) that Samza doc is very helpful too. On Tue, May 3, 2016 at 8:53 PM Benjamin Manns <benma...@gmail.com> wrote:
> Both of your ideas are doable. Another thing to keep in mind is that > depending on your data source, late arriving data will not be sorted in > front of the already committed events. You may need some windowing buffer > to recalculate for stragglers. > > For the multiple-topic approach, check out Samza's MessageChooser > https://wiki.apache.org/samza/Pluggable%20MessageChooser - other stream > processors may have something similar. > > On Tuesday, May 3, 2016, Kyle Mathews <mathews.k...@gmail.com> wrote: > > > Hi Kafka Users, > > I'm thinking through how to convert my application to use Kafka. I use an > > event sourcing model and something I do frequently is reprocess old > events > > when I change a model schema or update my processing code. > > > > In my current setup, I have few enough events that I can just load all > the > > event types that feed into a model and sort them all and then reprocess > > them. There's starting to be enough events though now that > loading/sorting > > events in memory is getting slow and sometimes causing OOM crashes. > > > > So one very attractive thing about Kafka is that all events are sorted so > > in theory, I just need to set a consumer's offset to 0 and things will > just > > workâ˘. But I've read that each event should have its own topic which > raises > > the question how do I reprocess a model that's pulling from multiple > topics > > while maintaining the order of events across multiple topics. > > > > So for the User model, say I have two events, userCreated and userUpdated > > each with a timestamp and an entity_id pointing to the user. If I'm > > reprocessing these, is there a normal pattern for how to pull events in > > order from multiple topics? > > > > One solution I've thought of is for producers to publish events to both > > event-specific topics as well as model topics e.g. userCreated would get > > published to the "userCreated" topic as well as the "user" topic. > > > > Another is that the stream processor for User, when reprocessing, would > > just look at the next event from each topic it's pulling from and always > > pull the oldest one next. Slightly tricky code but doable. > > > > Thoughts? > > > > > -- > Benjamin Manns > benma...@gmail.com > (434) 321-8324 >