Re: Using Kafka for "data" messages

Timothy Chen Thu, 13 Jun 2013 11:14:41 -0700

Also since you're going to be creating a topic per user, the number of
concurrent users will also be a concern to Kafka as it doesn't like massive
amounts of topics.


Tim


On Thu, Jun 13, 2013 at 10:47 AM, Josh Foure <user...@yahoo.com> wrote:

> Hi Mahendra, I think that is where it gets a little tricky.  I think it
> would work something like this:
>
> 1.  Web sends login event for user "user123" to topic "GUEST_EVENT".
> 2.  All of the systems consume those messages and publish the data
> messages to topic "GUEST_DATA.user123".
> 3.  The Recommendation system gets all of the data from
> "GUEST_DATA.user123", processes and then publishes back to the same topic
> "GUEST_DATA.user123".
> 4.  The Web consumes the messages from the same topic (there is a
> different topic for every user that logged in) "GUEST_DATA.user123" and
> when it finds the recommendation messages it pushes that to the browser
> (note it will need to read all the other data messages and discard those
> when looking for the recommendation messages).  I have a concern that the
> Web will be flooded with a ton of messages that it will promptly drop but I
> don't want to create a new "response" or "recommendation" topic because
> then I feel like I am tightly coupling the message to the functionality and
> in the future different systems may want to consume those messages as well.
>
> Does that make sense?
> Josh
>
>
>
>
>
>
> ________________________________
>  From: Mahendra M <mahendr...@gmail.com>
> To: users@kafka.apache.org; Josh Foure <user...@yahoo.com>
> Sent: Thursday, June 13, 2013 12:56 PM
> Subject: Re: Using Kafka for "data" messages
>
>
> Hi Josh,
>
> The idea looks very interesting. I just had one doubt.
>
> 1. A user logs in. His login id is sent on a topic
> 2. Other systems (consumers on this topic) consumer this message and
> publish their results to another topic
>
> This will be happening without any particular order for hundreds of users.
>
> Now the site being displayed to the user.. How will you fetch only messages
> for that user from the queue?
>
> Regards,
> Mahendra
>
>
>
> On Thu, Jun 13, 2013 at 8:51 PM, Josh Foure <user...@yahoo.com> wrote:
>
> >
> > Hi all, my team is proposing a novel
> > way of using Kafka and I am hoping someone can help do a sanity check on
> > this:
> >
> > 1.  When a user logs
> > into our website, we will create a “logged in” event message in Kafka
> > containing the user id.
> > 2.  30+ systems
> > (consumers each in their own consumer groups) will consume this event and
> > lookup data about this user id.  They
> > will then publish all of this data back out into Kafka as a series of
> data
> > messages.  One message may include the user’s name,
> > another the user’s address, another the user’s last 10 searches, another
> > their
> > last 10 orders, etc.  The plan is that a
> > single “logged in” event may trigger hundreds if not thousands of
> > additional data
> > messages.
> > 3.  Another system,
> > the “Product Recommendation” system, will have consumed the original
> > “logged in”
> > message and will also consume a subset of the data messages
> (realistically
> > I
> > think it would need to consume all of the data messages but would discard
> > the
> > ones it doesn’t need).  As the Product
> > Recommendation consumes the data messages, it will process recommended
> > products
> > and publish out recommendation messages (that get more and more specific
> > as it
> > has consumed more and more data messages).
> > 4.  The original
> > website will consume the recommendation messages and show the
> > recommendations to
> > the user as it gets them.
> >
> > You don’t see many systems implemented this way but since
> > Kafka has such a higher throughput than your typical MOM, this approach
> > seems
> > innovative.
> >
> > The benefits are:
> >
> > 1.  If we start
> > collecting more information about the users, we can simply start
> publishing
> > that in new data messages and consumers can start processing those
> messages
> > whenever they want.  If we were doing
> > this in a more traditional SOA approach the schemas would need to change
> > every time
> > we added a field but with this approach we can just create new messages
> > without
> > touching existing ones.
> > 2.  We are looking to
> > make our systems smaller so if we end up with more, smaller systems that
> > each
> > publish a small number of events, it becomes easier to make changes and
> > test
> > the changes.  If we were doing this in a
> > more traditional SOA approach we would need to retest each consumer every
> > time
> > we changed our bigger SOA services.
> >
> > The downside appears to be:
> >
> > 1.  We may be
> > publishing a large amount of data that never gets used but that everyone
> > needs
> > to consume to see if they need it before discarding it.
> > 2.  The Product Recommendation
> > system may need to wait until it consumes a number of messages and keep
> > track
> > of all the data internally before it can start processing.
> > 3.  While we may be
> > able to keep the messages somewhat small, the fact that they contain data
> > will
> > mean they will be bigger than your tradition EDA messages.
> > 4.  It seems like we
> > can do a lot of this using SOA (we already have an ESB than can do
> > transformations to address consumers expecting an older version of the
> > data).
> >
> > Any insight is appreciated.
> > Thanks,
> > Josh
>
>
>
>
> --
> Mahendra
>
> http://twitter.com/mahendra
>

Re: Using Kafka for "data" messages

Reply via email to