Re: [DISCUSS] KIP-1263: Group Coordinator Assignment Batching and Offload

Lucas Brutschy via dev Mon, 26 Jan 2026 04:43:08 -0800

Hi Sean,

that makes a lot of sense, thanks for the explanation!


Cheers,
Lucas

On Sat, Jan 24, 2026 at 11:17 AM Sean Quah via dev <[email protected]> wrote:
>
> Hi Lucas,
>
> LB01: I'm just wondering if it would have been an option to instead
> > update the target assignment to remove the partitions from the member
> > immediately when the member unsubscribes?
>
> Good question. I can't recall my thought process when implementing
> KAFKA-19431. Three points spring to mind with updating the target
> assignment on subscription changes:
> 1. I think I wanted to preserve the property that the target assignment is
> only updated by assignment runs and immutable for a given epoch. Though it
> turns out that's not actually the case since we do patch the target
> assignment when members leave or static members are replaced.
> 2. An offloaded assignment based on older subscriptions can complete right
> after we patch the target assignment to remove unsubscribed topics so we
> would need to do some extra filtering on assignor completion.
> 3. Even if we update the target assignment, we would need to touch the
> reconciliation process anyway, since it wouldn't do anything when there is
> no epoch bump.
>
> There is certainly nothing stopping us from updating the target assignment.
> I think it seemed cleaner at the time to keep it all in the reconciliation
> process.
>
> Thanks,
> Sean
>
> On Fri, Jan 23, 2026 at 12:32 PM Lucas Brutschy <[email protected]>
> wrote:
>
> > Hey Sean,
> >
> > thanks for the KIP! This makes a lot of sense to me. I don't really
> > have anything I want you to change about the KIP.
> >
> > > We modify reconciliation to revoke any partitions the member is no
> > longer subscribed to, since the target assignment may lag behind member
> > subscriptions.
> >
> > LB01: I'm just wondering if it would have been an option to instead
> > update the target assignment to remove the partitions from the member
> > immediately when the member unsubscribes?
> >
> > Cheers,
> > Lucas
> >
> > On Thu, Jan 22, 2026 at 11:58 AM Sean Quah via dev <[email protected]>
> > wrote:
> > >
> > > >
> > > > LM1: About group.initial.rebalance.delay.ms, I expect the interaction
> > > > with the interval is just as described for the streams initial delay
> > and
> > > > interval, correct? Should we clarify that in the KIP (it only mentions
> > the
> > > > streams case)
> > >
> > > We haven't added a consumer or share group initial.rebalance.delay.ms
> > > config yet. It only exists for streams right now.
> > >
> > > LM2: The KIP refers to batching assignment re-calculations triggered by
> > > > member subscriptions changes, but I expect the batching mechanism
> > applies
> > > > the same when the assignment re-calculation is triggered by metadata
> > > > changes (i.e topic/partition created or deleted), without any HB
> > changing
> > > > subscriptions. Is my understanding correct?
> > >
> > > Yes, that's right. Topic metadata changes also bump the group epoch and
> > > triggers the same assignment flow.
> > >
> > > LM3: About this section: "When there is an in-flight assignor run for the
> > > > group, there is no new target assignment. We will trigger the next
> > assignor
> > > > run on a future heartbeat.". I expect that the next assignor run will
> > be
> > > > triggered on the next HB from this or from any other member of the
> > group,
> > > > received after the interval expires (without the members re-sending the
> > > > subscription change). Is my expectation correct? If so, it may be worth
> > > > clarifying in the KIP to avoid confusion with client-side
> > implementations.
> > >
> > > I tried to clarify in the KIP. Let me know your thoughts!
> > >
> > > On Thu, Jan 22, 2026 at 10:56 AM Sean Quah <[email protected]> wrote:
> > >
> > > > dl01: Could we mention the handling when the group metadata or
> > > >> topic partition metadata is changed or deleted during the async
> > assignor
> > > >> run?
> > > >
> > > > Thanks! I've added a paragraph to the Assignment Offload section
> > > > describing the handling of group metadata changes. Topic metadata
> > changes
> > > > already bump the group epoch and we don't need to handle them
> > specially.
> > > >
> > > > dl02: This might be a question for the overall coordinator executor -
> > do
> > > >> we have plans to apply an explicit size limit to the executor queue?
> > If
> > > >> many groups trigger offloaded assignments simultaneously, should we
> > apply
> > > >> some backpressure for protection?
> > > >
> > > > There aren't any plans for that right now. We actually don't have a
> > size
> > > > limit for the event processor queue either.
> > > >
> > > > On Thu, Jan 22, 2026 at 10:56 AM Sean Quah <[email protected]> wrote:
> > > >
> > > >> Hi all, thanks for the feedback so far.
> > > >>
> > > >> dj01: In the proposed changes section, you state that the timestamp of
> > > >>> the last assignment is not persisted. How do you plan to bookkeep it
> > if it
> > > >>> is not stored with the assignment? Intuitively, I would add a
> > timestamp to
> > > >>> the assignment record.
> > > >>
> > > >> Thinking about it, it's easier to add it to the assignment record. I
> > will
> > > >> update the KIP. One thing to note is that the timestamp will be
> > subject to
> > > >> rollbacks when writing to the log fails, so we can allow extra
> > assignment
> > > >> runs when that happens.
> > > >>
> > > >> dj02: I wonder whether we should also add a "thread idle ratio" metric
> > > >>> for the group coordinator executor. What do you think?
> > > >>
> > > >> I think it could be useful so I've added it to the KIP. The
> > > >> implementation will have to be different to the event processor,
> > since we
> > > >> currently use an ExecutorService.
> > > >>
> > > >> dj03: If the executor is not used by the share coordinator, it should
> > not
> > > >>> expose any metrics about it. Is it possible to remove them?
> > > >>
> > > >> I've removed them from the KIP. We can add a parameter to the
> > coordinator
> > > >> metrics class to control whether they are visible.
> > > >>
> > > >> dj04: Is having one group coordinator executor thread sufficient by
> > > >>> default for common workloads?
> > > >>
> > > >> Yes and no. I expect it will be very difficult to overload an entire
> > > >> thread, ie. submit work faster than it can complete it. But updating
> > the
> > > >> default to two threads could be good for reducing delays due to
> > > >> simultaneous assignor runs. I've raised the default to 2 threads.
> > > >>
> > > >> dj05: It seems you propose enabling the minimum assignor interval
> > with a
> > > >>> default of 5 seconds. However, the offloading is not enabled by
> > default. Is
> > > >>> the first one enough to guarantee the stability of the group
> > coordinator?
> > > >>> How do you foresee enabling the second one in the future? It would
> > be great
> > > >>> if you could address this in the KIP. We need a clear motivation for
> > > >>> changing the default behavior and a plan for the future.
> > > >>
> > > >> I initially thought that offloading would increase rebalance times by
> > 1
> > > >> heartbeat and so didn't propose turning it on by default. But after
> > some
> > > >> more thinking, I believe both features will increase rebalance times
> > by 1
> > > >> heartbeat interval and the increase shouldn't stack. The minimum
> > assignor
> > > >> interval only impacts groups with more than 2 members, while
> > offloading
> > > >> only impacts groups with a single member. This is because in the other
> > > >> cases, the extra delays are folded into existing revocation +
> > heartbeat
> > > >> delays. Note that share groups have no revocation so always see
> > increased
> > > >> rebalance times. I've updated the KIP to add the analysis of rebalance
> > > >> times and propose turning both features on by default.
> > > >>
> > > >> dj06: Based on its description, I wonder whether `
> > > >>> consumer.min.assignor.interval.ms` should be called `
> > > >>> consumer.min.assignment.interval.ms`. What do you think?
> > > >>
> > > >> Thanks, I've renamed the config options in the KIP. What about the
> > > >> assignor.offload.enable configs?
> > > >>
> > > >> dj07: It is not possible to enable/disable the offloading at the group
> > > >>> level. This makes sense to me but it would be great to explain the
> > > >>> rationale for it in the KIP.
> > > >>
> > > >> Thinking about it, there's nothing stopping us from configuring
> > > >> offloading at the group level. In fact it might be desirable for some
> > users
> > > >> to disable offloading at the group coordinator level to keep
> > rebalances
> > > >> fast and only enable it for problematic large groups. I've added a
> > > >> group-level override to the KIP.
> > > >>
> > > >> On Tue, Jan 20, 2026 at 1:38 PM Lianet Magrans <[email protected]>
> > > >> wrote:
> > > >>
> > > >>> Hi Sean, thanks for the KIP.
> > > >>>
> > > >>> LM1: About group.initial.rebalance.delay.ms, I expect the
> > interaction
> > > >>> with the interval is just as described for the streams initial delay
> > and
> > > >>> interval, correct? Should we clarify that in the KIP (it only
> > mentions the
> > > >>> streams case)
> > > >>>
> > > >>> LM2: The KIP refers to batching assignment re-calculations triggered
> > by
> > > >>> member subscriptions changes, but I expect the batching mechanism
> > applies
> > > >>> the same when the assignment re-calculation is triggered by metadata
> > > >>> changes (i.e topic/partition created or deleted), without any HB
> > changing
> > > >>> subscriptions. Is my understanding correct?
> > > >>>
> > > >>> LM3: About this section: "*When there is an in-flight assignor run
> > for
> > > >>> the group, there is no new target assignment. We will trigger the
> > next
> > > >>> assignor run on a future heartbeat.*". I expect that the next
> > assignor
> > > >>> run will be triggered on the next HB from this or from any other
> > member of
> > > >>> the group, received after the interval expires (without the members
> > > >>> re-sending the subscription change). Is my expectation correct? If
> > so,
> > > >>> it may be worth clarifying in the KIP to avoid confusion with
> > client-side
> > > >>> implementations.
> > > >>>
> > > >>> Thanks!
> > > >>> Lianet
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Jan 13, 2026 at 1:23 AM Sean Quah via dev <
> > [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>>> sq01: We also have to update the SyncGroup request handling to only
> > > >>>> return
> > > >>>> REBALANCE_IN_PROGRESS when the member's epoch is behind the target
> > > >>>> assignment epoch, not the group epoch. Thanks to Dongnuo for
> > pointing
> > > >>>> this
> > > >>>> out.
> > > >>>>
> > > >>>> On Thu, Jan 8, 2026 at 5:40 PM Dongnuo Lyu via dev <
> > > >>>> [email protected]>
> > > >>>> wrote:
> > > >>>>
> > > >>>> > Hi Sean, thanks for the KIP! I have a few questions as follows.
> > > >>>> >
> > > >>>> > dl01: Could we mention the handling when the group metadata or
> > topic
> > > >>>> > partition metadata is changed or deleted during the async assignor
> > > >>>> run?
> > > >>>> >
> > > >>>> > dl02: This might be a question for the overall coordinator
> > executor -
> > > >>>> do we
> > > >>>> > have plans to apply an explicit size limit to the executor queue?
> > If
> > > >>>> many
> > > >>>> > groups trigger offloaded assignments simultaneously, should we
> > apply
> > > >>>> some
> > > >>>> > backpressure for protection?
> > > >>>> >
> > > >>>> > Also resonate with dj05, for small groups default `
> > > >>>> > min.assignor.interval.ms`
> > > >>>> > to 5s might not be necessary, so not sure if we want to make the
> > batch
> > > >>>> > assignment default. Or it might be good to have a per group
> > > >>>> enablement.
> > > >>>> >
> > > >>>> > Thanks
> > > >>>> > Dongnuo
> > > >>>> >
> > > >>>>
> > > >>>
> >

Re: [DISCUSS] KIP-1263: Group Coordinator Assignment Batching and Offload

Reply via email to