I currently have a topic with 1024 partitions. I know it's kind of going past the recommended limits, but I kept it like that because I am moving a legacy system to kafka and it has a 1024 parallel partitions. I wanted to understand the costs of having so many partitions a little bit more though. I understand that the brokers will be doing a lot more random disk IO with that many partitions unless I also have that many parallel disks in my cluster.
What are the effects on the producer though? I am using the 0.8.2 Beta client. Ideally it would batch all requests going to a broker, even though they might be for different partitions. Is that a correct assumption. If the producer attempts to flush on a per partition basis, the results could be disastrous since the number of messages per partition is probably very few say 15/second or so, but the number of messages per broker will be significantly higher at around 3500-4000 messages per second. Thanks, Rajiv