Also since you're going to be creating a topic per user, the number of concurrent users will also be a concern to Kafka as it doesn't like massive amounts of topics.
Tim On Thu, Jun 13, 2013 at 10:47 AM, Josh Foure <user...@yahoo.com> wrote: > Hi Mahendra, I think that is where it gets a little tricky. I think it > would work something like this: > > 1. Web sends login event for user "user123" to topic "GUEST_EVENT". > 2. All of the systems consume those messages and publish the data > messages to topic "GUEST_DATA.user123". > 3. The Recommendation system gets all of the data from > "GUEST_DATA.user123", processes and then publishes back to the same topic > "GUEST_DATA.user123". > 4. The Web consumes the messages from the same topic (there is a > different topic for every user that logged in) "GUEST_DATA.user123" and > when it finds the recommendation messages it pushes that to the browser > (note it will need to read all the other data messages and discard those > when looking for the recommendation messages). I have a concern that the > Web will be flooded with a ton of messages that it will promptly drop but I > don't want to create a new "response" or "recommendation" topic because > then I feel like I am tightly coupling the message to the functionality and > in the future different systems may want to consume those messages as well. > > Does that make sense? > Josh > > > > > > > ________________________________ > From: Mahendra M <mahendr...@gmail.com> > To: users@kafka.apache.org; Josh Foure <user...@yahoo.com> > Sent: Thursday, June 13, 2013 12:56 PM > Subject: Re: Using Kafka for "data" messages > > > Hi Josh, > > The idea looks very interesting. I just had one doubt. > > 1. A user logs in. His login id is sent on a topic > 2. Other systems (consumers on this topic) consumer this message and > publish their results to another topic > > This will be happening without any particular order for hundreds of users. > > Now the site being displayed to the user.. How will you fetch only messages > for that user from the queue? > > Regards, > Mahendra > > > > On Thu, Jun 13, 2013 at 8:51 PM, Josh Foure <user...@yahoo.com> wrote: > > > > > Hi all, my team is proposing a novel > > way of using Kafka and I am hoping someone can help do a sanity check on > > this: > > > > 1. When a user logs > > into our website, we will create a “logged in” event message in Kafka > > containing the user id. > > 2. 30+ systems > > (consumers each in their own consumer groups) will consume this event and > > lookup data about this user id. They > > will then publish all of this data back out into Kafka as a series of > data > > messages. One message may include the user’s name, > > another the user’s address, another the user’s last 10 searches, another > > their > > last 10 orders, etc. The plan is that a > > single “logged in” event may trigger hundreds if not thousands of > > additional data > > messages. > > 3. Another system, > > the “Product Recommendation” system, will have consumed the original > > “logged in” > > message and will also consume a subset of the data messages > (realistically > > I > > think it would need to consume all of the data messages but would discard > > the > > ones it doesn’t need). As the Product > > Recommendation consumes the data messages, it will process recommended > > products > > and publish out recommendation messages (that get more and more specific > > as it > > has consumed more and more data messages). > > 4. The original > > website will consume the recommendation messages and show the > > recommendations to > > the user as it gets them. > > > > You don’t see many systems implemented this way but since > > Kafka has such a higher throughput than your typical MOM, this approach > > seems > > innovative. > > > > The benefits are: > > > > 1. If we start > > collecting more information about the users, we can simply start > publishing > > that in new data messages and consumers can start processing those > messages > > whenever they want. If we were doing > > this in a more traditional SOA approach the schemas would need to change > > every time > > we added a field but with this approach we can just create new messages > > without > > touching existing ones. > > 2. We are looking to > > make our systems smaller so if we end up with more, smaller systems that > > each > > publish a small number of events, it becomes easier to make changes and > > test > > the changes. If we were doing this in a > > more traditional SOA approach we would need to retest each consumer every > > time > > we changed our bigger SOA services. > > > > The downside appears to be: > > > > 1. We may be > > publishing a large amount of data that never gets used but that everyone > > needs > > to consume to see if they need it before discarding it. > > 2. The Product Recommendation > > system may need to wait until it consumes a number of messages and keep > > track > > of all the data internally before it can start processing. > > 3. While we may be > > able to keep the messages somewhat small, the fact that they contain data > > will > > mean they will be bigger than your tradition EDA messages. > > 4. It seems like we > > can do a lot of this using SOA (we already have an ESB than can do > > transformations to address consumers expecting an older version of the > > data). > > > > Any insight is appreciated. > > Thanks, > > Josh > > > > > -- > Mahendra > > http://twitter.com/mahendra >