You probably want to think of this in terms of number of partitions on a
single broker, instead of per topic since I/O is the limiting factor in
this case. Another factor to consider is total number of partitions in the
cluster as Zookeeper becomes a limiting factor there. 30 partitions is not
too large provided the total number of partitions doesn't exceed roughly
couple thousand. To give you an example, some of our clusters are 16 nodes
big and some of the topics on those clusters have 30 partitions.

Thanks,
Neha
On Oct 4, 2013 4:15 AM, "Aniket Bhatnagar" <aniket.bhatna...@gmail.com>
wrote:

> I am using kafka as a buffer for data streaming in from various sources.
> Since its a time series data, I generate the key to the message by
> combining source ID and minute in the timestamp. This means I can utmost
> have 60 partitions per topic (as each source has its own topic). I have
> set num.partitions to be 30 (60/2) for each topic in broker config. I don't
> have a very good reason to pick 30 as default number of partitions per
> topic but I wanted it to be a high number so that I can achieve high
> parallelism during in-stream processing. I am worried that having a high
> number  like 30 (default configuration had it as 2), it can negatively
> impact kafka performance in terms of message throughput or memory
> consumption. I understand that this can lead to many files per partition
> but I am thinking of dealing with it by having multiple directories on the
> same disk if at all I run into issues.
>
> My question to the community is that am I prematurely attempting to
> optimizing the partition number as right now even a partition number of 5
> seems sufficient and hence will run into unwanted issues? Or is 30 an Ok
> number to use for number of partitions?
>

Reply via email to