Hello Kafka Friends, We are considering a use-case where we'd like to have a Kafka Cluster with potentially 1000's of partitions using a hashed key on customer userids. We have heard that Kafka can support 1000's of partitions in a single cluster and I wanted to find out if it's reasonable to have that many partitions?
Additionally, we'd like to have potentially 100,000's of consumers that are consuming a somewhat low volume of log data from these partitions. And I'd also like to know if having that many consumers is reasonable with Kafka or recommended. The scenario would be something like we have 100,000 to 200,000 customers where we'd like to have their data sharded by userid into a cluster of say 4000 partitions. And then we'd like to have a consumer running for each userid that is consuming the log data. In this scenario we'd have (assuming 100,000 userids) 100,000/4000 = 25 consumers per partition where each consumer would be reading each offset and ignoring whatever key is not related to the assigned userid that it is consuming from. My gut feeling with all of this tells me that this may not be a sound solution because we'd need to have a ton of file descriptors open and there could be a lot of overhead on Kafka managing this volume of consumers. Any guidance is appreciated...mainly I'm just looking to see if this a reasonable use of Kafka or if we need to go back to the drawing board. I appreciate any help! -Ralph