You need to be thoughtful about adding more partitions. This is paramount
if you are doing semantic partitioning in which case adding more partitions
could break things downstream.
If you average lets say 100,000 messages per second and at full tilt
consumer 1:1 for each partition you can process 100,000 messages per second
on lets say X partitions and then burst 1,000,000 messages on those X
partitions for 10 minutes then adding more partitions after that burst
won't help you because it will still take you over 2 hours to "catch up"
for the million that came in which you haven't added the more partitions
yet to compensate.

It is typical for just having something like say 100 partitions as the
default on creation for topics and if you know you need more then go for it
before hand but yes, if you need you can definitely add more later for sure.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Thu, Nov 20, 2014 at 11:57 PM, Daniel Compton <d...@danielcompton.net>
wrote:

> While it’s good to plan ahead for growth, Kafka will still let you add
> more partitions to a topic
> https://kafka.apache.org/081/ops.html#basic_ops_modify_topic. This will
> rebalance the hashing if you are partitioning by your key, and consumers
> will probably end up with different partitions, but don’t feel like you
> have to make the perfect config right at the start.
>
> Daniel.
>
> > On 21/11/2014, at 5:44 pm, Joe Stein <joe.st...@stealth.ly> wrote:
> >
> > If you plan ahead of time with enough partitions then you won't fall into
> > an issue of backed up consumers when you scale them up.
> >
> > If you have 100 partitions 20 consumers can read from them (each could
> read
> > from 5 partitions). You can scale up to 100 consumers (one for each
> > partition) as the upper limit. If you need more than that you should have
> > had more than 100 partitions to start. Scaling down can go to 1 consumer
> if
> > you wanted as 1 consumer can read from N partitions.
> >
> > If you are using the JVM you can look at
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > and
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > there
> > are other options in other languages and in the JVM too
> > https://cwiki.apache.org/confluence/display/KAFKA/Clients
> >
> > At the end of the day the Kafka broker will not impose any limitations
> for
> > what you are asking currently (as per the wire protocol
> >
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> > ) it is all about how the consumer is designed and developed.
> >
> > /*******************************************
> > Joe Stein
> > Founder, Principal Consultant
> > Big Data Open Source Security LLC
> > http://www.stealth.ly
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> > On Thu, Nov 20, 2014 at 3:18 PM, Sybrandy, Casey <
> > casey.sybra...@six3systems.com> wrote:
> >
> >> Hello,
> >>
> >> We're looking into using Kafka for a improved version of a system and
> the
> >> question of how to scale Kafka came up.  Specifically, we want to try to
> >> make the system scale as transparently as possible.  The concern was
> that
> >> if we go from N to N*2 consumers that we would have some that are still
> >> backed up while the new ones were working on only some of the new
> records.
> >> Also, if the load drops, can we scale down effectively?
> >>
> >> I'm sure there's a way to do it.  I'm just hoping that someone has some
> >> knowledge in this area.
> >>
> >> Thanks.
> >>
>
>

Reply via email to