"From what I understand, there's currently no way to prevent this type of shuffling of partitions from worker to worker while the consumers are under maintenance. I'm also not sure if this an issue I don't need to worry about."
If you don't want rebalance, consumers can also manually subscribe to specific partitions if you can live with not processing those partitions until your consumer is up or another consumer is stopped/started to pick up these additional specific partitions. On Fri, Jan 6, 2017 at 4:23 PM, <h...@confluent.io> wrote: > If you don't want or need automated rebalancing or partition reassignment > amongst clients then you could always just have each worker/client > subscribe directly to individual partitions using consumer.assign() rather > than consumer.subscribe(). That way when client 1 is restarted the data in > its partitions will not get assigned to any other client and it will just > pickup consuming from the same partition when it's restarted 3 seconds > later. Same for client 2 and so on. > > The drawback of doing your own manual partition assignment is that it's > manual ;-) If a new partition is created your code won't automatically know > to consume from it. > > -hans > > > > On Jan 6, 2017, at 4:42 PM, Pradeep Gollakota <pradeep...@gmail.com> > wrote: > > > > What I mean by "flapping" in this context is unnecessary rebalancing > > happening. The example I would give is what a Hadoop Datanode would do in > > case of a shutdown. By default, it will wait 10 minutes before > replicating > > the blocks owned by the Datanode so routine maintenance wouldn't cause > > unnecessary shuffling of blocks. > > > > In this context, if I'm performing a rolling restart, as soon as worker 1 > > shuts down, it's work is picked up by other workers. But worker 1 comes > > back 3 seconds (or whatever) later and requests the work back. Then > worker > > 2 goes down and it's work is assigned to other workers for 3 seconds > before > > yet another rebalance. So, in theory, the order of operations will look > > something like this: > > > > STOP (1) -> REBALANCE -> START (1) -> REBALANCE -> STOP (2) -> REBALANCE > -> > > START (2) -> REBALANCE -> .... > > > > From what I understand, there's currently no way to prevent this type of > > shuffling of partitions from worker to worker while the consumers are > under > > maintenance. I'm also not sure if this an issue I don't need to worry > about. > > > > - Pradeep > > > > On Thu, Jan 5, 2017 at 8:29 PM, Ewen Cheslack-Postava <e...@confluent.io > > > > wrote: > > > >> Not sure I understand your question about flapping. The > LeaveGroupRequest > >> is only sent on a graceful shutdown. If a consumer knows it is going to > >> shutdown, it is good to proactively make sure the group knows it needs > to > >> rebalance work because some of the partitions that were handled by the > >> consumer need to be handled by some other group members. > >> > >> There's no "flapping" in the sense that the leave group requests should > >> just inform the other members that they need to take over some of the > work. > >> I would normally think of "flapping" as meaning that things start/stop > >> unnecessarily. In this case, *someone* needs to deal with the rebalance > and > >> pick up the work being dropped by the worker. There's no flapping > because > >> it's a one-time event -- one worker is shutting down, decides to drop > the > >> work, and a rebalance sorts it out and reassigns it to another member of > >> the group. This happens once and then the "issue" is resolved without > any > >> additional interruptions. > >> > >> -Ewen > >> > >> On Thu, Jan 5, 2017 at 3:01 PM, Pradeep Gollakota <pradeep...@gmail.com > > > >> wrote: > >> > >>> I see... doesn't that cause flapping though? > >>> > >>> On Wed, Jan 4, 2017 at 8:22 PM, Ewen Cheslack-Postava < > e...@confluent.io > >>> > >>> wrote: > >>> > >>>> The coordinator will immediately move the group into a rebalance if it > >>>> needs it. The reason LeaveGroupRequest was added was to avoid having > to > >>>> wait for the session timeout before completing a rebalance. So aside > >> from > >>>> the latency of cleanup/committing offests/rejoining after a heartbeat, > >>>> rolling bounces should be fast for consumer groups. > >>>> > >>>> -Ewen > >>>> > >>>> On Wed, Jan 4, 2017 at 5:19 PM, Pradeep Gollakota < > >> pradeep...@gmail.com> > >>>> wrote: > >>>> > >>>>> Hi Kafka folks! > >>>>> > >>>>> When a consumer is closed, it will issue a LeaveGroupRequest. Does > >>> anyone > >>>>> know how long the coordinator waits before reassigning the partitions > >>>> that > >>>>> were assigned to the leaving consumer to a new consumer? I ask > >> because > >>>> I'm > >>>>> trying to understand the behavior of consumers if you're doing a > >>> rolling > >>>>> restart. > >>>>> > >>>>> Thanks! > >>>>> Pradeep > >>>>> > >>>> > >>> > >> > -- Radha Krishna, Proddaturi 253-234-5657