Take a look at: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic?
On Fri, May 23, 2014 at 12:49:39PM -0700, Bhavesh Mistry wrote: > Hi Kafka Users, > > > > We are trying to transport 4TB data per day on single topic. It is > operation application logs. How do we estimate number of partitions and > partitioning strategy? Our goal is to drain (from consumer side) from > the Kafka Brokers as soon as messages arrive (keep the lag as minimum as > possible) and also we would like to uniformly distribute the logs across > all partitions. > > > > Here is our Brokers HW Spec: > > 3 Broker Cluster (192 GB RAM, 32 Cores each with SSD to hold 7 days of data > ) with 100G NIC > > > > Data Rate : ~ 13 GB per minute > > > > > > Is there a formula to compute optimal number of partition need ? Also, how > to ensure uniform distribution from the producer side (currently we have > counter % numPartitions which is not viable solution in prod env) > > > > Thanks, > Bhavesh