Thank you for your comments and participation in the discussion, David,
Justine and Alex.

You are right! The KIP is missing a lot of details about the motivation. I
apologize for the confusion I created with my earlier statement about
reducing the downtime in this thread. I will request Christo to update it.

Meanwhile, as a summary, the KIP does not attempt to solve the problem of
losing consumer offsets after partition increase. Instead the objective of
the KIP is to reduce the time to recovery for reads to start after such an
event has occurred. Prior to this KIP, impact of the change manifests when
one of the brokers is restarted and the consumer groups remain in
errors/undefined state until all brokers have been finished restarting.
During a rolling restart, this places the time to recovery in proportion
with the number of brokers in the clusters. After this KIP is implemented,
we would not wait for the broker restart to pick up the new partitions,
instead all brokers will notified about the change in number of partitions
immediately. This would reduce the duration during which consumer groups
are in erroring/undefined state from length of rolling to time it takes to
process LISR across the cluster. Hence, a (small) win!

I hope this explanation throws some more light into the context.

Why do users change __consumer_offets?
1. They change it accidentally OR
2. They increase it to scale with the increase in the number of consumers.
This is because (correct me if I am wrong) with an increase in the number
of consumers, we can hit the limits on single partition throughput while
reading/writing to the __consumer_offsets. This is a genuine use case and
the downside of losing existing metadata/offsets is acceptable to them.

How do we ideally fix it?
An ideal solution would allow us to increase the number of partitions for
__consumer_offsets without losing existing metadata. We either need to make
partition assignment for a consumer "sticky" such that existing consumers
are not re-assigned to new partitions OR we need to transfer data as per
new partitions in __consumer_offsets. Both these approaches are long term
fixes and require a separate discussion.

What can we do in the short term?
In the short term either we can block users from changing the number of
partitions (which might not be possible due to use case #2 above) OR we can
at least improve (not fix but just improve!) the current situation by
reducing the time to recovery using this KIP.

Let's circle back on this discussion as soon as KIP is updated with more
details.

--
Divij Vaidya



On Tue, Apr 4, 2023 at 8:00 PM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi Christo,
>
> Thanks for the KIP. Apologies for the delayed review.
>
> At a high-level, I am not sure if the KIP really solves the problem it
> intends to.
>
> More specifically, the KIP mentions that once a broker is restarted
> and the group coordinator becomes aware of the new partition count of
> the consumer offsets topic, the problem is mitigated. However, how do
> we access the metadata and offsets recorded in a partition once it is
> no longer the partition a consumer group resolves to?
>
> Thanks,
> Alexandre
>
> Le mar. 4 avr. 2023 à 18:34, Justine Olshan
> <jols...@confluent.io.invalid> a écrit :
> >
> > Hi,
> >
> > I'm also a bit unsure of the motivation here. Is there a need to change
> the
> > number of partitions for this topic?
> >
> > Justine
> >
> > On Tue, Apr 4, 2023 at 10:07 AM David Jacot <david.ja...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am not very comfortable with the proposal of this KIP. The main
> issue is
> > > that changing the number of partitions means that all group metadata is
> > > lost because the hashing changes. I wonder if we should just disallow
> > > changing the number of partitions entirely. Did we consider something
> like
> > > this?
> > >
> > > Best,
> > > David
> > >
> > > Le mar. 4 avr. 2023 à 17:57, Divij Vaidya <divijvaidy...@gmail.com> a
> > > écrit :
> > >
> > > > FYI, a user faced this problem and reached out to us in the mailing
> list
> > > > [1]. Implementation of this KIP could have reduced the downtime for
> these
> > > > customers.
> > > >
> > > > Christo, would you like to create a JIRA and associate with the KIP
> so
> > > that
> > > > we can continue to collect cases in the JIRA where users have faced
> this
> > > > problem?
> > > >
> > > > [1] https://lists.apache.org/thread/zoowjshvdpkh5p0p7vqjd9fq8xvkr1nd
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Wed, Jan 18, 2023 at 9:52 AM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Greetings,
> > > > >
> > > > > I am bumping the below DISCUSSion thread for KIP-895. The KIP
> presents
> > > a
> > > > > situation where consumer groups are in an undefined state until a
> > > rolling
> > > > > restart of a cluster is performed. While I have demonstrated the
> > > > behaviour
> > > > > using a cluster using Zookeeper I believe the same problem can be
> shown
> > > > in
> > > > > a KRaft cluster. Please let me know your opinions on the problem
> and
> > > the
> > > > > presented solution.
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > > > > On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
> > > > > > <christo_lo...@yahoo.com.invalid> wrote:
> > > > > >
> > > > > >
> > > > > > Hello!
> > > > > > I would like to start this discussion thread on KIP-895:
> Dynamically
> > > > > > refresh partition count of __consumer_offsets.
> > > > > > The KIP proposes to alter brokers so that they refresh the
> partition
> > > > > count
> > > > > > of __consumer_offsets used to determine group coordinators
> without
> > > > > > requiring a rolling restart of the cluster.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
> > > > > >
> > > > > > Let me know your thoughts on the matter!
> > > > > > Best, Christo
> > > > > >
> > > > >
> > > >
> > >
>

Reply via email to