If you don't want or need automated rebalancing or partition reassignment amongst clients then you could always just have each worker/client subscribe directly to individual partitions using consumer.assign() rather than consumer.subscribe(). That way when client 1 is restarted the data in its partitions will not get assigned to any other client and it will just pickup consuming from the same partition when it's restarted 3 seconds later. Same for client 2 and so on.
The drawback of doing your own manual partition assignment is that it's manual ;-) If a new partition is created your code won't automatically know to consume from it. -hans > On Jan 6, 2017, at 4:42 PM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > > What I mean by "flapping" in this context is unnecessary rebalancing > happening. The example I would give is what a Hadoop Datanode would do in > case of a shutdown. By default, it will wait 10 minutes before replicating > the blocks owned by the Datanode so routine maintenance wouldn't cause > unnecessary shuffling of blocks. > > In this context, if I'm performing a rolling restart, as soon as worker 1 > shuts down, it's work is picked up by other workers. But worker 1 comes > back 3 seconds (or whatever) later and requests the work back. Then worker > 2 goes down and it's work is assigned to other workers for 3 seconds before > yet another rebalance. So, in theory, the order of operations will look > something like this: > > STOP (1) -> REBALANCE -> START (1) -> REBALANCE -> STOP (2) -> REBALANCE -> > START (2) -> REBALANCE -> .... > > From what I understand, there's currently no way to prevent this type of > shuffling of partitions from worker to worker while the consumers are under > maintenance. I'm also not sure if this an issue I don't need to worry about. > > - Pradeep > > On Thu, Jan 5, 2017 at 8:29 PM, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > >> Not sure I understand your question about flapping. The LeaveGroupRequest >> is only sent on a graceful shutdown. If a consumer knows it is going to >> shutdown, it is good to proactively make sure the group knows it needs to >> rebalance work because some of the partitions that were handled by the >> consumer need to be handled by some other group members. >> >> There's no "flapping" in the sense that the leave group requests should >> just inform the other members that they need to take over some of the work. >> I would normally think of "flapping" as meaning that things start/stop >> unnecessarily. In this case, *someone* needs to deal with the rebalance and >> pick up the work being dropped by the worker. There's no flapping because >> it's a one-time event -- one worker is shutting down, decides to drop the >> work, and a rebalance sorts it out and reassigns it to another member of >> the group. This happens once and then the "issue" is resolved without any >> additional interruptions. >> >> -Ewen >> >> On Thu, Jan 5, 2017 at 3:01 PM, Pradeep Gollakota <pradeep...@gmail.com> >> wrote: >> >>> I see... doesn't that cause flapping though? >>> >>> On Wed, Jan 4, 2017 at 8:22 PM, Ewen Cheslack-Postava <e...@confluent.io >>> >>> wrote: >>> >>>> The coordinator will immediately move the group into a rebalance if it >>>> needs it. The reason LeaveGroupRequest was added was to avoid having to >>>> wait for the session timeout before completing a rebalance. So aside >> from >>>> the latency of cleanup/committing offests/rejoining after a heartbeat, >>>> rolling bounces should be fast for consumer groups. >>>> >>>> -Ewen >>>> >>>> On Wed, Jan 4, 2017 at 5:19 PM, Pradeep Gollakota < >> pradeep...@gmail.com> >>>> wrote: >>>> >>>>> Hi Kafka folks! >>>>> >>>>> When a consumer is closed, it will issue a LeaveGroupRequest. Does >>> anyone >>>>> know how long the coordinator waits before reassigning the partitions >>>> that >>>>> were assigned to the leaving consumer to a new consumer? I ask >> because >>>> I'm >>>>> trying to understand the behavior of consumers if you're doing a >>> rolling >>>>> restart. >>>>> >>>>> Thanks! >>>>> Pradeep >>>>> >>>> >>> >>