(inline) On Mon, Dec 15, 2014 at 11:45:07AM -0800, Rajiv Kurian wrote: > I currently have a topic with 1024 partitions. I know it's kind of going > past the recommended limits, but I kept it like that because I am moving a > legacy system to kafka and it has a 1024 parallel partitions. I wanted to > understand the costs of having so many partitions a little bit more though. > I understand that the brokers will be doing a lot more random disk IO with > that many partitions unless I also have that many parallel disks in my > cluster.
See https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowmanytopicscanIhave? and https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic? It is true that more partitions will lead to more random disk IO but in practice you will still benefit from hitting pagecache provided most of your consumers are mostly caught up - which also correlates to how high the incoming message rate is. > > What are the effects on the producer though? I am using the 0.8.2 Beta > client. Ideally it would batch all requests going to a broker, even though > they might be for different partitions. Is that a correct assumption. If Yes - it will send out multi-topic-partition producer requests for each broker. -- Joel