All, Thanks. Everything you've said is what I was already assuming/knew. I was wondering if there were alternative solutions as a concern that came up was similar to the burst situation where our existing cluster gets backed up, but if we add more partitions we still end up with those same partitions being backed up, just being populated more slowly barring some weird partitioning bug.
Again, thanks. ________________________________________ From: Joe Stein [joe.st...@stealth.ly] Sent: Friday, November 21, 2014 12:15 AM To: users@kafka.apache.org Subject: Re: Elastsic Scaling Meant to say "burst 1,000,000 messages per second on those X partitions for 10 minutes" On Fri, Nov 21, 2014 at 12:12 AM, Joe Stein <joe.st...@stealth.ly> wrote: > You need to be thoughtful about adding more partitions. This is paramount > if you are doing semantic partitioning in which case adding more partitions > could break things downstream. > > If you average lets say 100,000 messages per second and at full tilt > consumer 1:1 for each partition you can process 100,000 messages per second > on lets say X partitions and then burst 1,000,000 messages on those X > partitions for 10 minutes then adding more partitions after that burst > won't help you because it will still take you over 2 hours to "catch up" > for the million that came in which you haven't added the more partitions > yet to compensate. > > It is typical for just having something like say 100 partitions as the > default on creation for topics and if you know you need more then go for it > before hand but yes, if you need you can definitely add more later for sure. > > /******************************************* > Joe Stein > Founder, Principal Consultant > Big Data Open Source Security LLC > http://www.stealth.ly > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> > ********************************************/ > > On Thu, Nov 20, 2014 at 11:57 PM, Daniel Compton <d...@danielcompton.net> > wrote: > >> While it’s good to plan ahead for growth, Kafka will still let you add >> more partitions to a topic >> https://kafka.apache.org/081/ops.html#basic_ops_modify_topic. This will >> rebalance the hashing if you are partitioning by your key, and consumers >> will probably end up with different partitions, but don’t feel like you >> have to make the perfect config right at the start. >> >> Daniel. >> >> > On 21/11/2014, at 5:44 pm, Joe Stein <joe.st...@stealth.ly> wrote: >> > >> > If you plan ahead of time with enough partitions then you won't fall >> into >> > an issue of backed up consumers when you scale them up. >> > >> > If you have 100 partitions 20 consumers can read from them (each could >> read >> > from 5 partitions). You can scale up to 100 consumers (one for each >> > partition) as the upper limit. If you need more than that you should >> have >> > had more than 100 partitions to start. Scaling down can go to 1 >> consumer if >> > you wanted as 1 consumer can read from N partitions. >> > >> > If you are using the JVM you can look at >> > >> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example >> > and >> > >> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example >> > there >> > are other options in other languages and in the JVM too >> > https://cwiki.apache.org/confluence/display/KAFKA/Clients >> > >> > At the end of the day the Kafka broker will not impose any limitations >> for >> > what you are asking currently (as per the wire protocol >> > >> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol >> > ) it is all about how the consumer is designed and developed. >> > >> > /******************************************* >> > Joe Stein >> > Founder, Principal Consultant >> > Big Data Open Source Security LLC >> > http://www.stealth.ly >> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> >> > ********************************************/ >> > >> > On Thu, Nov 20, 2014 at 3:18 PM, Sybrandy, Casey < >> > casey.sybra...@six3systems.com> wrote: >> > >> >> Hello, >> >> >> >> We're looking into using Kafka for a improved version of a system and >> the >> >> question of how to scale Kafka came up. Specifically, we want to try >> to >> >> make the system scale as transparently as possible. The concern was >> that >> >> if we go from N to N*2 consumers that we would have some that are still >> >> backed up while the new ones were working on only some of the new >> records. >> >> Also, if the load drops, can we scale down effectively? >> >> >> >> I'm sure there's a way to do it. I'm just hoping that someone has some >> >> knowledge in this area. >> >> >> >> Thanks. >> >> >> >> >