(inline)

On Mon, Dec 15, 2014 at 11:45:07AM -0800, Rajiv Kurian wrote:
> I currently have a topic with 1024 partitions. I know it's kind of going
> past the recommended limits, but I kept it like that because I am moving a
> legacy system to kafka and it has a 1024 parallel partitions. I wanted to
> understand the costs of having so many partitions a little bit more though.
> I understand that the brokers will be doing a lot more random disk IO with
> that many partitions unless I also have that many parallel disks in my
> cluster.

See
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowmanytopicscanIhave?
and
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic?

It is true that more partitions will lead to more random disk IO but
in practice you will still benefit from hitting pagecache provided
most of your consumers are mostly caught up - which also correlates to
how high the incoming message rate is.

> 
> What are the effects on the producer though? I am using the 0.8.2 Beta
> client. Ideally it would batch all requests going to a broker, even though
> they might be for different partitions. Is that a correct assumption. If

Yes - it will send out multi-topic-partition producer requests for
each broker.

-- 
Joel

Reply via email to