At LinkedIn, some of the high volume topics are configured with more than 1 partition per broker. Having more partitions increases I/O parallelism for writes and also increases the degree of parallelism for consumers (since partition is the unit for distributing data to consumers). On the other hand, more partitions adds some overhead: (a) there will be more files and thus more open file handlers; (b) there are more offsets to be checkpointed by consumers which can increase the load of ZK. So, you want to balace these tradeoffs.
Thanks, Jun On Mon, Jan 14, 2013 at 11:55 PM, Andrew Psaltis < andrew.psal...@webtrends.com> wrote: > All, > I was re-reading this: > https://cwiki.apache.org/confluence/display/KAFKA/Operations and noticed > that the number of partitions is 1. Is this accurate? In our environment we > are currently running 20+ partitions per topic - with two brokers, the gut > feel was this would speed up our ability to read from many threads in a > consumer group. What I am lacking is a true understanding of the pros/cons > of having more partitions on a given broker, what are they? Are there > guidelines to follow in setting up the partitions? > > Thanks in advance, > Andrew > > >