I currently have a topic with 1024 partitions. I know it's kind of going
past the recommended limits, but I kept it like that because I am moving a
legacy system to kafka and it has a 1024 parallel partitions. I wanted to
understand the costs of having so many partitions a little bit more though.
I understand that the brokers will be doing a lot more random disk IO with
that many partitions unless I also have that many parallel disks in my
cluster.

What are the effects on the producer though? I am using the 0.8.2 Beta
client. Ideally it would batch all requests going to a broker, even though
they might be for different partitions. Is that a correct assumption. If
the producer attempts to flush on a per partition basis, the results could
be disastrous since the number of messages per partition is probably very
few say 15/second or so, but the number of messages per broker will be
significantly higher at around 3500-4000 messages per second.

Thanks,
Rajiv

Reply via email to