Thanks Gwen. This really helped. Yes, Kafka is the best thing ever :)
Now how would this be done with the Simple consumer? I'm guessing I'll have to maintain my own state in Zookeeper or something of that sort? On Thu, Oct 9, 2014 at 12:01 AM, Gwen Shapira <gshap...@cloudera.com> wrote: > Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1) > and 1 consumer group (flume), each of the 3 topic partitions is being > read by a different machine running the flume consumer: > Group Topic Pid Offset > logSize Lag Owner > flume t1 0 50172068 > 100210042 50037974 > flume_kafkacdh-1.ent.cloudera.com-1412722833783-3d6d80db-0 > flume t1 1 49914701 > 49914701 0 > flume_kafkacdh-2.ent.cloudera.com-1412722838536-a6a4915d-0 > flume t1 2 54218841 > 82733380 28514539 > flume_kafkacdh-3.ent.cloudera.com-1412722832793-b23eaa63-0 > > If flume_kafkacdh-1 crashed, another broker will pick up the partition: > Group Topic Pid Offset > logSize Lag Owner > flume t1 0 59669715 > 100210042 40540327 > flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0 > flume t1 1 49914701 > 49914701 0 > flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0 > flume t1 2 65796205 > 82733380 16937175 > flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0 > > Then I can start flume_kafkacdh-4 and see things rebalance again: > flume t1 0 60669715 > 100210042 39540327 > flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0 > flume t1 1 49914701 > 49914701 0 > flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0 > flume t1 2 66829740 > 82733380 15903640 > flume_kafkacdh-4.ent.cloudera.com-1412793053882-9bfddff9-0 > > Isn't Kafka the best thing ever? :) > > Gwen > > On Wed, Oct 8, 2014 at 11:23 AM, Gwen Shapira <gshap...@cloudera.com> > wrote: > > yep. exactly. > > > > On Wed, Oct 8, 2014 at 11:07 AM, Sharninder <sharnin...@gmail.com> > wrote: > >> Thanks Gwen. > >> > >> When you're saying that I can add consumers to the same group, does that > >> also hold true if those consumers are running on different machines? Or > in > >> different JVMs? > >> > >> -- > >> Sharninder > >> > >> > >> On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> > >>> If you use the high level consumer implementation, and register all > >>> consumers as part of the same group - they will load-balance > >>> automatically. > >>> > >>> When you add a consumer to the group, if there are enough partitions > >>> in the topic, some of the partitions will be assigned to the new > >>> consumer. > >>> When a consumer crashes, once its node in ZK times out, other > >>> consumers will get its partitions. > >>> > >>> Gwen > >>> > >>> On Wed, Oct 8, 2014 at 10:39 AM, Sharninder <sharnin...@gmail.com> > wrote: > >>> > Hi, > >>> > > >>> > I'm not even sure if this is a valid use-case, but I really wanted > to run > >>> > it by you guys. How do I load balance my consumers? For example, if > my > >>> > consumer machine is under load, I'd like to spin up another VM with > >>> another > >>> > consumer process to keep reading messages off any topic. On similar > >>> lines, > >>> > how do you guys handle consumer failures? Suppose one consumer > process > >>> gets > >>> > an exception and crashes, is it possible for me to somehow make sure > that > >>> > there is another process that is still reading the queue for me? > >>> > > >>> > -- > >>> > Sharninder > >>> >