Hi, This will be with Kafka 0.8. That is some good guidance, thank you. To summarize, we can scale the # of hosts/HDs as high as we want, but we should keep an eye on the total number of partitions being handled. We've currently configured a default of 4 partitions per topic, so we'll watch closely once we reach >250 topics. That should give us plenty to work with. Thanks!
Scott Arthur On Fri, Aug 2, 2013 at 10:49 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > Hi Scott, > > What version of Kafka is this? > > In general our throughput will scale linearly with the number of machines > or more specifically the number of disks. Our bottleneck will really be > with the number of partitions. With thousands of partitions leader election > can get slower (seconds), and if you have consumers that consume all > partitions the rebalancing in these consumers can get slow (minutes). > > We hope to fix these issues but that is the current state up through 0.8. > > -Jay > > > On Fri, Aug 2, 2013 at 2:27 PM, Scott Arthur <sart...@salesforce.com> > wrote: > > > Hi, > > > > I have a question about scaling the broker count of a Kafka cluster. We > > have a scenario where we'll have two clusters replicating data into a > > third. We're wondering how we should size that third cluster so that it > > can handle the volume of messages from the two source clusters. Should > we > > just make the number of brokers match? e.g. five brokers in the two > source > > clusters, therefore 10 in the destination cluster. In general, what is > the > > horizontal scaling model we should use? Also, is there an upper limit to > > the number of brokers you should put in a cluster, after which you get > > diminishing returns on throughput? > > > > Thanks, > > Scott Arthur > > >