Re: Using Kafka for "data" messages

Josh Foure Thu, 13 Jun 2013 12:13:44 -0700

Ah yes, I had read that Kafka likes under 1,000 topics but I wasn't sure if 
that was really a limitation.  In principle I wouldn't mind having all guest 
events placed on the "GUEST_DATA" queue but I thought that by having more 
topics I could minimize having consumers read messages only to discard them.  
My thought had been that if I have 20 Web JVM and at any given time I have 
1,000 people logged in per JVM, each JVM would only need to consume the 
messages from 1,000 topics.  If instead there is a single topic, each JVM will 
be consuming from the same topic (and be in different consumer groups) but 19 
out of 20 messages will be for guests that are not even logged into that JVM.  
Since Kafka doesn't have message selectors or anything like that I was hoping 
to use topics to help segregate the traffic.  I don't want to use 1 topic per 
Web JVM because in the future other consumers may be interested in that same 
data and the services that put the data in
 Kafka shouldn't have to lookup what JVM that user is logged into (or get that 
from another message and keep track of it).  Any thoughts on how to work around 
this?  I know there are topic partitions but that seems more like a way to 
distribute the workload in terms of storing the messages and not for the 
message selection scenario I am describing if I understood correctly.





________________________________
 From: Timothy Chen <tnac...@gmail.com>
To: users@kafka.apache.org; Josh Foure <user...@yahoo.com> 
Sent: Thursday, June 13, 2013 2:13 PM
Subject: Re: Using Kafka for "data" messages
 

Also since you're going to be creating a topic per user, the number of
concurrent users will also be a concern to Kafka as it doesn't like massive
amounts of topics.

Tim


On Thu, Jun 13, 2013 at 10:47 AM, Josh Foure <user...@yahoo.com> wrote:

> Hi Mahendra, I think that is where it gets a little tricky.  I think it
> would work something like this:
>
> 1.  Web sends login event for user "user123" to topic "GUEST_EVENT".
> 2.  All of the systems consume those messages and publish the data
> messages to topic "GUEST_DATA.user123".
> 3.  The Recommendation system gets all of the data from
> "GUEST_DATA.user123", processes and then publishes back to the same topic
> "GUEST_DATA.user123".
> 4.  The Web consumes the messages from the same topic (there is a
> different topic for every user that logged in) "GUEST_DATA.user123" and
> when it finds the recommendation messages it pushes that to the browser
> (note it will need to read all the other data messages and discard those
> when looking for the recommendation messages).  I have a concern that the
> Web will be flooded with a ton of messages that it will promptly drop but I
> don't want to create a new "response" or "recommendation" topic because
> then I feel like I am tightly coupling the message to the functionality and
> in the future different systems may want to consume those messages as well.
>
> Does that make sense?
> Josh
>
>
>
>
>
>
> ________________________________
>  From: Mahendra M <mahendr...@gmail.com>
> To: users@kafka.apache.org; Josh Foure <user...@yahoo.com>
> Sent: Thursday, June 13, 2013 12:56 PM
> Subject: Re: Using Kafka for "data" messages
>
>
> Hi Josh,
>
> The idea looks very interesting. I just had one doubt.
>
> 1. A user logs in. His login id is sent on a topic
> 2. Other systems (consumers on this topic) consumer this message and
> publish their results to another topic
>
> This will be happening without any particular order for hundreds of users.
>
> Now the site being displayed to the user.. How will you fetch only messages
> for that user from the queue?
>
> Regards,
> Mahendra
>
>
>
> On Thu, Jun 13, 2013 at 8:51 PM, Josh Foure <user...@yahoo.com> wrote:
>
> >
> > Hi all, my team is proposing a novel
> > way of using Kafka and I am hoping someone can help do a sanity check on
> > this:
> >
> > 1.  When a user logs
> > into our website, we will create a “logged in” event message in Kafka
> > containing the user id.
> > 2.  30+ systems
> > (consumers each in their own consumer groups) will consume this event and
> > lookup data about this user id.  They
> > will then publish all of this data back out into Kafka as a series of
> data
> > messages.  One message may include the user’s name,
> > another the user’s address, another the user’s last 10 searches, another
> > their
> > last 10 orders, etc.  The plan is that a
> > single “logged in” event may trigger hundreds if not thousands of
> > additional data
> > messages.
> > 3.  Another system,
> > the “Product Recommendation” system, will have consumed the original
> > “logged in”
> > message and will also consume a subset of the data messages
> (realistically
> > I
> > think it would need to consume all of the data messages but would discard
> > the
> > ones it doesn’t need).  As the Product
> > Recommendation consumes the data messages, it will process recommended
> > products
> > and publish out recommendation messages (that get more and more specific
> > as it
> > has consumed more and more data messages).
> > 4.  The original
> > website will consume the recommendation messages and show the
> > recommendations to
> > the user as it gets them.
> >
> > You don’t see many systems implemented this way but since
> > Kafka has such a higher throughput than your typical MOM, this approach
> > seems
> > innovative.
> >
> > The benefits are:
> >
> > 1.  If we start
> > collecting more information about the users, we can simply start
> publishing
> > that in new data messages and consumers can start processing those
> messages
> > whenever they want.  If we were doing
> > this in a more traditional SOA approach the schemas would need to change
> > every time
> > we added a field but with this approach we can just create new messages
> > without
> > touching existing ones.
> > 2.  We are looking to
> > make our systems smaller so if we end up with more, smaller systems that
> > each
> > publish a small number of events, it becomes easier to make changes and
> > test
> > the changes.  If we were doing this in a
> > more traditional SOA approach we would need to retest each consumer every
> > time
> > we changed our bigger SOA services.
> >
> > The downside appears to be:
> >
> > 1.  We may be
> > publishing a large amount of data that never gets used but that everyone
> > needs
> > to consume to see if they need it before discarding it.
> > 2.  The Product Recommendation
> > system may need to wait until it consumes a number of messages and keep
> > track
> > of all the data internally before it can start processing.
> > 3.  While we may be
> > able to keep the messages somewhat small, the fact that they contain data
> > will
> > mean they will be bigger than your tradition EDA messages.
> > 4.  It seems like we
> > can do a lot of this using SOA (we already have an ESB than can do
> > transformations to address consumers expecting an older version of the
> > data).
> >
> > Any insight is appreciated.
> > Thanks,
> > Josh
>
>
>
>
> --
> Mahendra
>
> http://twitter.com/mahendra
>

Re: Using Kafka for "data" messages

Reply via email to