Thanks Ben! Nice to know I was on the right track :-) that Samza doc is
very helpful too.
On Tue, May 3, 2016 at 8:53 PM Benjamin Manns <benma...@gmail.com> wrote:

> Both of your ideas are doable. Another thing to keep in mind is that
> depending on your data source, late arriving data will not be sorted in
> front of the already committed events. You may need some windowing buffer
> to recalculate for stragglers.
>
> For the multiple-topic approach, check out Samza's MessageChooser
> https://wiki.apache.org/samza/Pluggable%20MessageChooser - other stream
> processors may have something similar.
>
> On Tuesday, May 3, 2016, Kyle Mathews <mathews.k...@gmail.com> wrote:
>
> > Hi Kafka Users,
> > I'm thinking through how to convert my application to use Kafka. I use an
> > event sourcing model and something I do frequently is reprocess old
> events
> > when I change a model schema or update my processing code.
> >
> > In my current setup, I have few enough events that I can just load all
> the
> > event types that feed into a model and sort them all and then reprocess
> > them. There's starting to be enough events though now that
> loading/sorting
> > events in memory is getting slow and sometimes causing OOM crashes.
> >
> > So one very attractive thing about Kafka is that all events are sorted so
> > in theory, I just need to set a consumer's offset to 0 and things will
> just
> > work™. But I've read that each event should have its own topic which
> raises
> > the question how do I reprocess a model that's pulling from multiple
> topics
> > while maintaining the order of events across multiple topics.
> >
> > So for the User model, say I have two events, userCreated and userUpdated
> > each with a timestamp and an entity_id pointing to the user. If I'm
> > reprocessing these, is there a normal pattern for how to pull events in
> > order from multiple topics?
> >
> > One solution I've thought of is for producers to publish events to both
> > event-specific topics as well as model topics e.g. userCreated would get
> > published to the "userCreated" topic as well as the "user" topic.
> >
> > Another is that the stream processor for User, when reprocessing, would
> > just look at the next event from each topic it's pulling from and always
> > pull the oldest one next. Slightly tricky code but doable.
> >
> > Thoughts?
> >
>
>
> --
> Benjamin Manns
> benma...@gmail.com
> (434) 321-8324
>

Reply via email to