Hi

I heave read Jay Kreps post regarding the number of topics that can be handled 
by a broker 
(https://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka), and it 
has left me with more questions that I dont see answered anywhere else.


We have a data stream which will be consumed by many consumers (~400).  We also 
have many "groups" within our data.  A group in the data corresponds 1:1 with 
what the consumers would consume, so consumer A only ever see group A messages, 
consumer B only consumes group B messages, etc.


The downstream consumers will be consuming via a websocket API, so the API 
server will be the thing consuming from kafka.


If I use a single topic with, say, 20 partitions, the consumers in the API 
server would need to re-read the same messages over and over for each consumer, 
which seems like a waste of network and a potential bottleneck.


Alternatively, I could use a single topic with 20 partitions and have a single 
consumer in the API put the messages into cassandra/redis (as suggested by 
Jay), and serve out the downstream consumer streams that way.  However, that 
requires using a secondary sorted storage, which seems like a waste (and added 
complexity) given that Kafka already has the data exactly as I need it.  
Especially if cassandra/redis are required to maintain a long TTL on the stream.


Finally, I could use 1 topic per group, each with a single partition.  This 
would result in 400 topics on the broker, but would allow the API server to 
simply serve the stream for each consumer directly from kafka and wont require 
additional machinery to serve out the requests.


The 400 topic solution makes the most sense to me (doesnt require extra 
services, doesnt waste resources), but seem to conflict with best practices, so 
I wanted to ask the community for input.  Has anyone done this before?  What 
makes the most sense here?




Thanks


Shaun

Reply via email to