Deja vu! IMO, what you are describing is a database problem, even though you are talking/thinking about it as a queue problem. I'm sure you could construct something using Kafka (and Samza), but I think you'd have an easier time with a database. The number of pending messages per user and the average message size would be critical in selecting exactly which sort of database to use.
My $0.02. On Thu, Dec 5, 2013 at 7:47 PM, mission mission <[email protected]>wrote: > Hello, > > According to the Kafka FAQ "How do I choose the number of partitions for a > topic", clusters with more than 10K partitions are not tested. I am looking > for advice on how to scale the number of partitions beyond that. My use > case is to publish messages to 1 million users, each with an unique user > id. Users are not always connected but a user must receive published > messages in order. > > What is the best way to divide topics and partitions for this use case? Do > I need 1 million partitions? The FAQ seems to think so, i.e. "if we were > storing notifications for users we would encourage a design with a single > notifications topic partitioned by user id". But the FAQ implies strongly > that 1 million partitions may wreak havoc on zookeeper because they will > lead to X million znodes that have to be stored in memory. Any suggestions? > > Thanks, > > mission >
