Hi David Thank you for your comments. My concern about that idea is that with only one topic, it will slow a lot of things down. I am assuming there are at least 6~7 physical consumers so I can safely assume to have more topics. ( Separate topic by operation perhaps?)
Also according to your approach, wouldn't partition be created for 100 millions? as far as I know, partition works in IO file which means it will slow entire system down (Am I even correct on this?) Its all matter of how to make sure user A activity does not block User B Thank you for your answers! On Wed, Jul 6, 2016 at 12:24 AM, David Newberger < david.newber...@wandcorp.com> wrote: > Hi, > > I think the recommended approach to this would be to have a single topic > and partition it by userId. This will give you locality and order by user. > If you think about it this would give you a better ordering guarantee than > if you had one topic per users. It's also a lot more efficient. If you are > using Kafka as a log or messaging system you really should not need > millions of topics or partitions. If I'm miss understanding the use case > please let me know. > > Cheers, > > David Newberger > > -----Original Message----- > From: Hyounmin Wang [mailto:hyunmi...@gmail.com] > Sent: Tuesday, July 5, 2016 1:50 AM > To: users@kafka.apache.org > Subject: Kafka Beginners planning problem. > > Hi there! > > I'm new grad engineer and is pretty new to kafka world. > > I'm trying to replace rabbit mq with apache-kafka and while planning, I > bumped in to several conceptual planning problem. > > First we are using rabbit mq for per user queue policy meaning each user > uses one queue. This suits our need because each user represent some job to > be done with that particular user, and if that user causes a problem, the > queue will never have a problem with other users because queues are > seperated ( Problem meaning messages in the queue will be dispatch to the > users using http request. If user refuses to receive a message (server down > perhaps?) it will go back in retry queue, which will result in no loses of > message (Unless queue goes down)) > > Now kafka is fault tolerant and failure safe because it write to a disk. > And its exactly why I am trying to implement kafka to our structure. > > but there are problem to my plannings. > > First, I was thinking to create as many topic as per user meaning each > user would have each topic (What problem will this cause? My max estimate > is that I will have around 1~5 million topics) > > Second, If I decide to go for topics based on operation and partition by > random hash of users id, if there was a problem with one user not consuming > message currently, will the all user in the partition have to wait ? What > would be the best way to structure this situation? > > So as conclusion, 1~5 millions users. We do not want to have one user > blocking large number of other users being processed. Having topic per user > will solve this issue, it seems like there might be an issue with zookeeper > if such large number gets in (Is this true? ) > > what would be the best solution for structuring? Considering scalability? >