Re: [DISCUSS] KIP-1263: Group Coordinator Assignment Batching and Offload

Sean Quah via dev Sat, 24 Jan 2026 02:17:13 -0800

Hi Lucas,

LB01: I'm just wondering if it would have been an option to instead
> update the target assignment to remove the partitions from the member
> immediately when the member unsubscribes?


Good question. I can't recall my thought process when implementing
KAFKA-19431. Three points spring to mind with updating the target
assignment on subscription changes:
1. I think I wanted to preserve the property that the target assignment is
only updated by assignment runs and immutable for a given epoch. Though it
turns out that's not actually the case since we do patch the target
assignment when members leave or static members are replaced.
2. An offloaded assignment based on older subscriptions can complete right
after we patch the target assignment to remove unsubscribed topics so we
would need to do some extra filtering on assignor completion.
3. Even if we update the target assignment, we would need to touch the
reconciliation process anyway, since it wouldn't do anything when there is
no epoch bump.

There is certainly nothing stopping us from updating the target assignment.
I think it seemed cleaner at the time to keep it all in the reconciliation
process.

Thanks,
Sean

On Fri, Jan 23, 2026 at 12:32 PM Lucas Brutschy <[email protected]>
wrote:

> Hey Sean,
>
> thanks for the KIP! This makes a lot of sense to me. I don't really
> have anything I want you to change about the KIP.
>
> > We modify reconciliation to revoke any partitions the member is no
> longer subscribed to, since the target assignment may lag behind member
> subscriptions.
>
> LB01: I'm just wondering if it would have been an option to instead
> update the target assignment to remove the partitions from the member
> immediately when the member unsubscribes?
>
> Cheers,
> Lucas
>
> On Thu, Jan 22, 2026 at 11:58 AM Sean Quah via dev <[email protected]>
> wrote:
> >
> > >
> > > LM1: About group.initial.rebalance.delay.ms, I expect the interaction
> > > with the interval is just as described for the streams initial delay
> and
> > > interval, correct? Should we clarify that in the KIP (it only mentions
> the
> > > streams case)
> >
> > We haven't added a consumer or share group initial.rebalance.delay.ms
> > config yet. It only exists for streams right now.
> >
> > LM2: The KIP refers to batching assignment re-calculations triggered by
> > > member subscriptions changes, but I expect the batching mechanism
> applies
> > > the same when the assignment re-calculation is triggered by metadata
> > > changes (i.e topic/partition created or deleted), without any HB
> changing
> > > subscriptions. Is my understanding correct?
> >
> > Yes, that's right. Topic metadata changes also bump the group epoch and
> > triggers the same assignment flow.
> >
> > LM3: About this section: "When there is an in-flight assignor run for the
> > > group, there is no new target assignment. We will trigger the next
> assignor
> > > run on a future heartbeat.". I expect that the next assignor run will
> be
> > > triggered on the next HB from this or from any other member of the
> group,
> > > received after the interval expires (without the members re-sending the
> > > subscription change). Is my expectation correct? If so, it may be worth
> > > clarifying in the KIP to avoid confusion with client-side
> implementations.
> >
> > I tried to clarify in the KIP. Let me know your thoughts!
> >
> > On Thu, Jan 22, 2026 at 10:56 AM Sean Quah <[email protected]> wrote:
> >
> > > dl01: Could we mention the handling when the group metadata or
> > >> topic partition metadata is changed or deleted during the async
> assignor
> > >> run?
> > >
> > > Thanks! I've added a paragraph to the Assignment Offload section
> > > describing the handling of group metadata changes. Topic metadata
> changes
> > > already bump the group epoch and we don't need to handle them
> specially.
> > >
> > > dl02: This might be a question for the overall coordinator executor -
> do
> > >> we have plans to apply an explicit size limit to the executor queue?
> If
> > >> many groups trigger offloaded assignments simultaneously, should we
> apply
> > >> some backpressure for protection?
> > >
> > > There aren't any plans for that right now. We actually don't have a
> size
> > > limit for the event processor queue either.
> > >
> > > On Thu, Jan 22, 2026 at 10:56 AM Sean Quah <[email protected]> wrote:
> > >
> > >> Hi all, thanks for the feedback so far.
> > >>
> > >> dj01: In the proposed changes section, you state that the timestamp of
> > >>> the last assignment is not persisted. How do you plan to bookkeep it
> if it
> > >>> is not stored with the assignment? Intuitively, I would add a
> timestamp to
> > >>> the assignment record.
> > >>
> > >> Thinking about it, it's easier to add it to the assignment record. I
> will
> > >> update the KIP. One thing to note is that the timestamp will be
> subject to
> > >> rollbacks when writing to the log fails, so we can allow extra
> assignment
> > >> runs when that happens.
> > >>
> > >> dj02: I wonder whether we should also add a "thread idle ratio" metric
> > >>> for the group coordinator executor. What do you think?
> > >>
> > >> I think it could be useful so I've added it to the KIP. The
> > >> implementation will have to be different to the event processor,
> since we
> > >> currently use an ExecutorService.
> > >>
> > >> dj03: If the executor is not used by the share coordinator, it should
> not
> > >>> expose any metrics about it. Is it possible to remove them?
> > >>
> > >> I've removed them from the KIP. We can add a parameter to the
> coordinator
> > >> metrics class to control whether they are visible.
> > >>
> > >> dj04: Is having one group coordinator executor thread sufficient by
> > >>> default for common workloads?
> > >>
> > >> Yes and no. I expect it will be very difficult to overload an entire
> > >> thread, ie. submit work faster than it can complete it. But updating
> the
> > >> default to two threads could be good for reducing delays due to
> > >> simultaneous assignor runs. I've raised the default to 2 threads.
> > >>
> > >> dj05: It seems you propose enabling the minimum assignor interval
> with a
> > >>> default of 5 seconds. However, the offloading is not enabled by
> default. Is
> > >>> the first one enough to guarantee the stability of the group
> coordinator?
> > >>> How do you foresee enabling the second one in the future? It would
> be great
> > >>> if you could address this in the KIP. We need a clear motivation for
> > >>> changing the default behavior and a plan for the future.
> > >>
> > >> I initially thought that offloading would increase rebalance times by
> 1
> > >> heartbeat and so didn't propose turning it on by default. But after
> some
> > >> more thinking, I believe both features will increase rebalance times
> by 1
> > >> heartbeat interval and the increase shouldn't stack. The minimum
> assignor
> > >> interval only impacts groups with more than 2 members, while
> offloading
> > >> only impacts groups with a single member. This is because in the other
> > >> cases, the extra delays are folded into existing revocation +
> heartbeat
> > >> delays. Note that share groups have no revocation so always see
> increased
> > >> rebalance times. I've updated the KIP to add the analysis of rebalance
> > >> times and propose turning both features on by default.
> > >>
> > >> dj06: Based on its description, I wonder whether `
> > >>> consumer.min.assignor.interval.ms` should be called `
> > >>> consumer.min.assignment.interval.ms`. What do you think?
> > >>
> > >> Thanks, I've renamed the config options in the KIP. What about the
> > >> assignor.offload.enable configs?
> > >>
> > >> dj07: It is not possible to enable/disable the offloading at the group
> > >>> level. This makes sense to me but it would be great to explain the
> > >>> rationale for it in the KIP.
> > >>
> > >> Thinking about it, there's nothing stopping us from configuring
> > >> offloading at the group level. In fact it might be desirable for some
> users
> > >> to disable offloading at the group coordinator level to keep
> rebalances
> > >> fast and only enable it for problematic large groups. I've added a
> > >> group-level override to the KIP.
> > >>
> > >> On Tue, Jan 20, 2026 at 1:38 PM Lianet Magrans <[email protected]>
> > >> wrote:
> > >>
> > >>> Hi Sean, thanks for the KIP.
> > >>>
> > >>> LM1: About group.initial.rebalance.delay.ms, I expect the
> interaction
> > >>> with the interval is just as described for the streams initial delay
> and
> > >>> interval, correct? Should we clarify that in the KIP (it only
> mentions the
> > >>> streams case)
> > >>>
> > >>> LM2: The KIP refers to batching assignment re-calculations triggered
> by
> > >>> member subscriptions changes, but I expect the batching mechanism
> applies
> > >>> the same when the assignment re-calculation is triggered by metadata
> > >>> changes (i.e topic/partition created or deleted), without any HB
> changing
> > >>> subscriptions. Is my understanding correct?
> > >>>
> > >>> LM3: About this section: "*When there is an in-flight assignor run
> for
> > >>> the group, there is no new target assignment. We will trigger the
> next
> > >>> assignor run on a future heartbeat.*". I expect that the next
> assignor
> > >>> run will be triggered on the next HB from this or from any other
> member of
> > >>> the group, received after the interval expires (without the members
> > >>> re-sending the subscription change). Is my expectation correct? If
> so,
> > >>> it may be worth clarifying in the KIP to avoid confusion with
> client-side
> > >>> implementations.
> > >>>
> > >>> Thanks!
> > >>> Lianet
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 13, 2026 at 1:23 AM Sean Quah via dev <
> [email protected]>
> > >>> wrote:
> > >>>
> > >>>> sq01: We also have to update the SyncGroup request handling to only
> > >>>> return
> > >>>> REBALANCE_IN_PROGRESS when the member's epoch is behind the target
> > >>>> assignment epoch, not the group epoch. Thanks to Dongnuo for
> pointing
> > >>>> this
> > >>>> out.
> > >>>>
> > >>>> On Thu, Jan 8, 2026 at 5:40 PM Dongnuo Lyu via dev <
> > >>>> [email protected]>
> > >>>> wrote:
> > >>>>
> > >>>> > Hi Sean, thanks for the KIP! I have a few questions as follows.
> > >>>> >
> > >>>> > dl01: Could we mention the handling when the group metadata or
> topic
> > >>>> > partition metadata is changed or deleted during the async assignor
> > >>>> run?
> > >>>> >
> > >>>> > dl02: This might be a question for the overall coordinator
> executor -
> > >>>> do we
> > >>>> > have plans to apply an explicit size limit to the executor queue?
> If
> > >>>> many
> > >>>> > groups trigger offloaded assignments simultaneously, should we
> apply
> > >>>> some
> > >>>> > backpressure for protection?
> > >>>> >
> > >>>> > Also resonate with dj05, for small groups default `
> > >>>> > min.assignor.interval.ms`
> > >>>> > to 5s might not be necessary, so not sure if we want to make the
> batch
> > >>>> > assignment default. Or it might be good to have a per group
> > >>>> enablement.
> > >>>> >
> > >>>> > Thanks
> > >>>> > Dongnuo
> > >>>> >
> > >>>>
> > >>>
>

Re: [DISCUSS] KIP-1263: Group Coordinator Assignment Batching and Offload

Reply via email to