Thanks for raising this here, Calvin. Since this is the first "streaming results" type API in KafkaAdminClient (as far as I know), we're treading new ground here.
As you mentioned, we can either accept a consumer or return some iterable result. Returning a java.util.Stream is also an option, and a bit more modern/convenient than java.util.Iterator. Personally, I like the consumer approach, but I'm interested in hearing other's opinions. This actually brings up another question: Do we think it's safe to assume that one topic's description can fit into memory? The RPC supports paging across partitions within a single topic, so maybe the admin API should as well? -David On Fri, Feb 23, 2024 at 12:22 PM Calvin Liu <ca...@confluent.io> wrote: > Hey, > As we agreed to implement the pagination for the new API > DescribeTopicPartitions, the client side must also add a proper interface > to handle the pagination. > The current KafkaAdminClient.describeTopics returns > the DescribeTopicsResult which is the future for querying all the topics. > It is awkward to fit the pagination into it because > > 1. Each future corresponds to a topic. We also want to have the > pagination on huge topics for their partitions. > 2. To avoid OOM, we should only fetch the new topics when we need them > and release the used topics. Especially the main use case of looping the > topic list is when the client prints all the topics. > > So, to better serve the pagination, @David Arthur > <david.art...@confluent.io> suggested to add a new interface in the Admin > client between the following 2. > > describeTopics(TopicCollection topics, DescribeTopicsOptions options, > Consumer<TopicDescription>); > > Iterator<TopicDescription> describeTopics(TopicCollection topics, > DescribeTopicsOptions options); > > David and I would prefer the first Consumer version which works better as a > stream purposes. > > > On Wed, Oct 11, 2023 at 4:28 PM Calvin Liu <ca...@confluent.io> wrote: > >> Hi David, >> Thanks for the comment. >> Yes, we can separate the ELR enablement from the metadata version. It is >> also helpful to avoid blocking the following MV releases if the user is not >> ready for ELR. >> One thing to correct is that, the Unclean recovery is controlled >> by unclean.recovery.manager.enabled, a separate config >> from unclean.recovery.strategy. It determines whether unclean recovery will >> be used in an unclean leader election. >> Thanks >> >> On Wed, Oct 11, 2023 at 4:11 PM David Arthur <mum...@gmail.com> wrote: >> >>> One thing we should consider is a static config to totally enable/disable >>> the ELR feature. If I understand the KIP correctly, we can effectively >>> disable the unclean recovery by setting the recovery strategy config to >>> "none". >>> >>> This would make development and rollout of this feature a bit smoother. >>> Consider the case that we find bugs in ELR after a cluster has updated to >>> its MetadataVersion. It's simpler to disable the feature through config >>> rather than going through a MetadataVersion downgrade (once that's >>> supported). >>> >>> Does that make sense? >>> >>> -David >>> >>> On Wed, Oct 11, 2023 at 1:40 PM Calvin Liu <ca...@confluent.io.invalid> >>> wrote: >>> >>> > Hi Jun >>> > -Good catch, yes, we don't need the -1 in the DescribeTopicRequest. >>> > -No new value is added. The LeaderRecoveryState will still be set to 1 >>> if >>> > we have an unclean leader election. The unclean leader election >>> includes >>> > the old random way and the unclean recovery. During the unclean >>> recovery, >>> > the LeaderRecoveryState will not change until the controller decides to >>> > update the records with the new leader. >>> > Thanks >>> > >>> > On Wed, Oct 11, 2023 at 9:02 AM Jun Rao <j...@confluent.io.invalid> >>> wrote: >>> > >>> > > Hi, Calvin, >>> > > >>> > > Another thing. Currently, when there is an unclean leader election, >>> we >>> > set >>> > > the LeaderRecoveryState in PartitionRecord and PartitionChangeRecord >>> to >>> > 1. >>> > > With the KIP, will there be new values for LeaderRecoveryState? If >>> not, >>> > > when will LeaderRecoveryState be set to 1? >>> > > >>> > > Thanks, >>> > > >>> > > Jun >>> > > >>> > > On Tue, Oct 10, 2023 at 4:24 PM Jun Rao <j...@confluent.io> wrote: >>> > > >>> > > > Hi, Calvin, >>> > > > >>> > > > One more comment. >>> > > > >>> > > > "The first partition to fetch details for. -1 means to fetch all >>> > > > partitions." It seems that FirstPartitionId of 0 naturally means >>> > fetching >>> > > > all partitions? >>> > > > >>> > > > Thanks, >>> > > > >>> > > > Jun >>> > > > >>> > > > On Tue, Oct 10, 2023 at 12:40 PM Calvin Liu >>> <ca...@confluent.io.invalid >>> > > >>> > > > wrote: >>> > > > >>> > > >> Hi Jun, >>> > > >> Yeah, with the current Metadata request handling, we only return >>> > errors >>> > > on >>> > > >> the Topic level, like topic not found. It seems that querying a >>> > specific >>> > > >> partition is not a valid use case. Will update. >>> > > >> Thanks >>> > > >> >>> > > >> On Tue, Oct 10, 2023 at 11:55 AM Jun Rao <j...@confluent.io.invalid >>> > >>> > > >> wrote: >>> > > >> >>> > > >> > Hi, Calvin, >>> > > >> > >>> > > >> > 60. If the range query has errors for some of the partitions, >>> do we >>> > > >> expect >>> > > >> > different responses when querying particular partitions? >>> > > >> > >>> > > >> > Thanks, >>> > > >> > >>> > > >> > Jun >>> > > >> > >>> > > >> > On Tue, Oct 10, 2023 at 10:50 AM Calvin Liu >>> > > <ca...@confluent.io.invalid >>> > > >> > >>> > > >> > wrote: >>> > > >> > >>> > > >> > > Hi Jun >>> > > >> > > 60. Yes, it is a good question. I was thinking the API could >>> be >>> > > >> flexible >>> > > >> > to >>> > > >> > > query the particular partitions if the range query has errors >>> for >>> > > >> some of >>> > > >> > > the partitions. Not sure whether it is a valid assumption, >>> what do >>> > > you >>> > > >> > > think? >>> > > >> > > >>> > > >> > > 61. Good point, I will update them to partition level with the >>> > same >>> > > >> > limit. >>> > > >> > > >>> > > >> > > 62. Sure, will do. >>> > > >> > > >>> > > >> > > Thanks >>> > > >> > > >>> > > >> > > On Tue, Oct 10, 2023 at 10:12 AM Jun Rao >>> <j...@confluent.io.invalid >>> > > >>> > > >> > wrote: >>> > > >> > > >>> > > >> > > > Hi, Calvin, >>> > > >> > > > >>> > > >> > > > A few more minor comments on your latest update. >>> > > >> > > > >>> > > >> > > > 60. DescribeTopicRequest: When will the Partitions field be >>> > used? >>> > > It >>> > > >> > > seems >>> > > >> > > > that the FirstPartitionId field is enough for AdminClient >>> usage. >>> > > >> > > > >>> > > >> > > > 61. Could we make the limit for DescribeTopicRequest, >>> > > >> > > ElectLeadersRequest, >>> > > >> > > > GetReplicaLogInfo consistent? Currently, >>> ElectLeadersRequest's >>> > > >> limit is >>> > > >> > > at >>> > > >> > > > topic level and GetReplicaLogInfo has a different partition >>> > level >>> > > >> limit >>> > > >> > > > from DescribeTopicRequest. >>> > > >> > > > >>> > > >> > > > 62. Should ElectLeadersRequest.DesiredLeaders be at the same >>> > level >>> > > >> as >>> > > >> > > > ElectLeadersRequest.TopicPartitions.Partitions? In the KIP, >>> it >>> > > looks >>> > > >> > like >>> > > >> > > > it's at the same level as >>> ElectLeadersRequest.TopicPartitions. >>> > > >> > > > >>> > > >> > > > Thanks, >>> > > >> > > > >>> > > >> > > > Jun >>> > > >> > > > >>> > > >> > > > On Wed, Oct 4, 2023 at 3:55 PM Calvin Liu >>> > > >> <ca...@confluent.io.invalid> >>> > > >> > > > wrote: >>> > > >> > > > >>> > > >> > > > > Hi David, >>> > > >> > > > > Thanks for the comments. >>> > > >> > > > > ---- >>> > > >> > > > > I thought that a new snapshot with the downgraded MV is >>> > created >>> > > in >>> > > >> > this >>> > > >> > > > > case. Isn’t it the case? >>> > > >> > > > > Yes, you are right, a metadata delta will be generated >>> after >>> > the >>> > > >> MV >>> > > >> > > > > downgrade. Then the user can start the software downgrade. >>> > > >> > > > > ----- >>> > > >> > > > > Could you also elaborate a bit more on the reasoning >>> behind >>> > > adding >>> > > >> > the >>> > > >> > > > > limits to the admin RPCs? This is a new pattern in Kafka >>> so it >>> > > >> would >>> > > >> > be >>> > > >> > > > > good to clear on the motivation. >>> > > >> > > > > Thanks to Colin for bringing it up. The current >>> > MetadataRequest >>> > > >> does >>> > > >> > > not >>> > > >> > > > > have a limit on the number of topics to query in a single >>> > > request. >>> > > >> > > > Massive >>> > > >> > > > > requests can mess up the JVM. We want to have some sort of >>> > > >> throttle >>> > > >> > on >>> > > >> > > > the >>> > > >> > > > > new APIs. >>> > > >> > > > > ----- >>> > > >> > > > > Could you also explain how the client is supposed to >>> handle >>> > the >>> > > >> > > > > topics/partitions above the limit? I suppose that it will >>> have >>> > > to >>> > > >> > retry >>> > > >> > > > > those, correct? >>> > > >> > > > > Corrent. For the official admin clients, it will split the >>> > large >>> > > >> > > request >>> > > >> > > > > into proper pieces and query one after another. >>> > > >> > > > > ----- >>> > > >> > > > > My understanding is that the topics/partitions above the >>> limit >>> > > >> will >>> > > >> > be >>> > > >> > > > > failed with an invalid exception error. I wonder if this >>> > choice >>> > > is >>> > > >> > > > > judicious because the invalide request exception is >>> usually >>> > > >> fatal. It >>> > > >> > > may >>> > > >> > > > > be better to use an new and explicit error for this case. >>> > > >> > > > > >>> > > >> > > > > Thanks for bringing this up. How about >>> > "REQUEST_LIMIT_REACHED"? >>> > > >> > > > > -------- >>> > > >> > > > > It seems that we still need to specify the changes to the >>> > admin >>> > > >> api >>> > > >> > to >>> > > >> > > > > accommodate the new or updated apis. Do you plan to add >>> them? >>> > > >> > > > > Try to cover the following >>> > > >> > > > > 1. The admin client will use the new DescribeTopicRequest >>> to >>> > > query >>> > > >> > the >>> > > >> > > > > topics >>> > > >> > > > > 2. Mention the API limit and the new retriable error. >>> > > >> > > > > 3. Output changes for the admin client when describing a >>> topic >>> > > >> (new >>> > > >> > > > fields >>> > > >> > > > > of ELR...) >>> > > >> > > > > 4. Changes to data structures like TopicPartitionInfo to >>> > include >>> > > >> the >>> > > >> > > ELR. >>> > > >> > > > > Anything else I missed? >>> > > >> > > > > >>> > > >> > > > > Thanks! >>> > > >> > > > > >>> > > >> > > > > >>> > > >> > > > > >>> > > >> > > > > >>> > > >> > > > > >>> > > >> > > > > On Wed, Oct 4, 2023 at 12:27 PM David Jacot < >>> > > >> david.ja...@gmail.com> >>> > > >> > > > wrote: >>> > > >> > > > > >>> > > >> > > > > > Hi Calvin, >>> > > >> > > > > > >>> > > >> > > > > > I thought that a new snapshot with the downgraded MV is >>> > > created >>> > > >> in >>> > > >> > > this >>> > > >> > > > > > case. Isn’t it the case? >>> > > >> > > > > > >>> > > >> > > > > > Could you also elaborate a bit more on the reasoning >>> behind >>> > > >> adding >>> > > >> > > the >>> > > >> > > > > > limits to the admin RPCs? This is a new pattern in >>> Kafka so >>> > it >>> > > >> > would >>> > > >> > > be >>> > > >> > > > > > good to clear on the motivation. >>> > > >> > > > > > >>> > > >> > > > > > Could you also explain how the client is supposed to >>> handle >>> > > the >>> > > >> > > > > > topics/partitions above the limit? I suppose that it >>> will >>> > have >>> > > >> to >>> > > >> > > retry >>> > > >> > > > > > those, correct? >>> > > >> > > > > > >>> > > >> > > > > > My understanding is that the topics/partitions above the >>> > limit >>> > > >> will >>> > > >> > > be >>> > > >> > > > > > failed with an invalid exception error. I wonder if this >>> > > choice >>> > > >> is >>> > > >> > > > > > judicious because the invalide request exception is >>> usually >>> > > >> fatal. >>> > > >> > It >>> > > >> > > > may >>> > > >> > > > > > be better to use an new and explicit error for this >>> case. >>> > > >> > > > > > >>> > > >> > > > > > It seems that we still need to specify the changes to >>> the >>> > > admin >>> > > >> api >>> > > >> > > to >>> > > >> > > > > > accommodate the new or updated apis. Do you plan to add >>> > them? >>> > > >> > > > > > >>> > > >> > > > > > Best, >>> > > >> > > > > > David >>> > > >> > > > > > >>> > > >> > > > > > Le mer. 4 oct. 2023 à 20:39, Calvin Liu >>> > > >> <ca...@confluent.io.invalid >>> > > >> > > >>> > > >> > > a >>> > > >> > > > > > écrit : >>> > > >> > > > > > >>> > > >> > > > > > > Hi Jun, >>> > > >> > > > > > > After the MV downgrade, the controller will write in >>> the >>> > old >>> > > >> > > version >>> > > >> > > > of >>> > > >> > > > > > the >>> > > >> > > > > > > PartitionRecord/PartitionChangeRecord. If I understand >>> > > >> correctly, >>> > > >> > > it >>> > > >> > > > is >>> > > >> > > > > > > possible to downgrade the software version if the >>> > controller >>> > > >> only >>> > > >> > > has >>> > > >> > > > > to >>> > > >> > > > > > > handle old version records. >>> > > >> > > > > > > However, the controller will not automatically >>> rewrite the >>> > > >> > > > > > PartitionRecord >>> > > >> > > > > > > with the old version unless there is a partition >>> update. >>> > > Then, >>> > > >> > the >>> > > >> > > > user >>> > > >> > > > > > may >>> > > >> > > > > > > have to wait an unknown amount of time before the >>> software >>> > > >> > > downgrades >>> > > >> > > > > > > unless they do a roll to force update every >>> partition. If >>> > it >>> > > >> > makes >>> > > >> > > > > > sense, I >>> > > >> > > > > > > can mention these steps to do a software downgrade. >>> > > >> > > > > > > Thanks >>> > > >> > > > > > > >>> > > >> > > > > > > On Wed, Oct 4, 2023 at 11:20 AM Jun Rao >>> > > >> <j...@confluent.io.invalid >>> > > >> > > >>> > > >> > > > > > wrote: >>> > > >> > > > > > > >>> > > >> > > > > > > > Hi, Calvin and Justine, >>> > > >> > > > > > > > >>> > > >> > > > > > > > Historically, when we change the record format in >>> the >>> > log, >>> > > >> we >>> > > >> > > don't >>> > > >> > > > > > > support >>> > > >> > > > > > > > software version downgrading. >>> > > >> > > > > > > > >>> > > >> > > > > > > > For the record format change in the metadata log, >>> have >>> > we >>> > > >> > thought >>> > > >> > > > > about >>> > > >> > > > > > > > forcing the write of the latest metadata records >>> with >>> > the >>> > > >> old >>> > > >> > > > version >>> > > >> > > > > > > > during MV downgrading? This will in theory allow >>> the old >>> > > >> > version >>> > > >> > > of >>> > > >> > > > > the >>> > > >> > > > > > > > software to obtain the latest metadata. >>> > > >> > > > > > > > >>> > > >> > > > > > > > Thanks, >>> > > >> > > > > > > > >>> > > >> > > > > > > > Jun >>> > > >> > > > > > > > >>> > > >> > > > > > > > On Wed, Oct 4, 2023 at 9:53 AM Justine Olshan >>> > > >> > > > > > > <jols...@confluent.io.invalid >>> > > >> > > > > > > > > >>> > > >> > > > > > > > wrote: >>> > > >> > > > > > > > >>> > > >> > > > > > > > > Sorry -- not MV but software version. >>> > > >> > > > > > > > > >>> > > >> > > > > > > > > On Wed, Oct 4, 2023 at 9:51 AM Justine Olshan < >>> > > >> > > > > jols...@confluent.io> >>> > > >> > > > > > > > > wrote: >>> > > >> > > > > > > > > >>> > > >> > > > > > > > > > Catching up with this discussion. >>> > > >> > > > > > > > > > >>> > > >> > > > > > > > > > I was just curious -- have we had other >>> instances >>> > > where >>> > > >> > > > > downgrading >>> > > >> > > > > > > MV >>> > > >> > > > > > > > is >>> > > >> > > > > > > > > > not supported? I think Kafka typically tries to >>> > > support >>> > > >> > > > > downgrades, >>> > > >> > > > > > > > and I >>> > > >> > > > > > > > > > couldn't think of other examples. >>> > > >> > > > > > > > > > >>> > > >> > > > > > > > > > Thanks, >>> > > >> > > > > > > > > > Justine >>> > > >> > > > > > > > > > >>> > > >> > > > > > > > > > On Wed, Oct 4, 2023 at 9:40 AM Calvin Liu >>> > > >> > > > > > <ca...@confluent.io.invalid >>> > > >> > > > > > > > >>> > > >> > > > > > > > > > wrote: >>> > > >> > > > > > > > > > >>> > > >> > > > > > > > > >> Hi Jun, >>> > > >> > > > > > > > > >> 54. Marked the software downgrading is not >>> > supported. >>> > > >> As >>> > > >> > the >>> > > >> > > > old >>> > > >> > > > > > > > > >> controller >>> > > >> > > > > > > > > >> will not understand the new PartitionRecord and >>> > > >> > > > > > > PartitionChangeRecord. >>> > > >> > > > > > > > > >> Thanks! >>> > > >> > > > > > > > > >> >>> > > >> > > > > > > > > >> On Wed, Oct 4, 2023 at 9:12 AM Jun Rao >>> > > >> > > > <j...@confluent.io.invalid >>> > > >> > > > > > >>> > > >> > > > > > > > > wrote: >>> > > >> > > > > > > > > >> >>> > > >> > > > > > > > > >> > Hi, Calvin, >>> > > >> > > > > > > > > >> > >>> > > >> > > > > > > > > >> > Thanks for the reply. Just one more comment. >>> > > >> > > > > > > > > >> > >>> > > >> > > > > > > > > >> > 54. It seems that downgrading MV is >>> supported. Is >>> > > >> > > > downgrading >>> > > >> > > > > > the >>> > > >> > > > > > > > > >> software >>> > > >> > > > > > > > > >> > version supported? It would be useful to >>> document >>> > > >> that. >>> > > >> > > > > > > > > >> > >>> > > >> > > > > > > > > >> > Thanks, >>> > > >> > > > > > > > > >> > >>> > > >> > > > > > > > > >> > Jun >>> > > >> > > > > > > > > >> > >>> > > >> > > > > > > > > >> > On Tue, Oct 3, 2023 at 4:55 PM Artem Livshits >>> > > >> > > > > > > > > >> > <alivsh...@confluent.io.invalid> wrote: >>> > > >> > > > > > > > > >> > >>> > > >> > > > > > > > > >> > > Hi Colin, >>> > > >> > > > > > > > > >> > > >>> > > >> > > > > > > > > >> > > I think in your example >>> "do_unclean_recovery" >>> > > would >>> > > >> > need >>> > > >> > > > to >>> > > >> > > > > do >>> > > >> > > > > > > > > >> different >>> > > >> > > > > > > > > >> > > things depending on the strategy. >>> > > >> > > > > > > > > >> > > >>> > > >> > > > > > > > > >> > > do_unclean_recovery() { >>> > > >> > > > > > > > > >> > > if (unclean.recovery.manager.enabled) { >>> > > >> > > > > > > > > >> > > if (strategy == Aggressive) >>> > > >> > > > > > > > > >> > > use >>> > > >> > UncleanRecoveryManager(waitLastKnownERL=false) >>> > > >> > > > // >>> > > >> > > > > > > just >>> > > >> > > > > > > > > >> inspect >>> > > >> > > > > > > > > >> > > logs from whoever is available >>> > > >> > > > > > > > > >> > > else >>> > > >> > > > > > > > > >> > > use >>> > > >> > UncleanRecoveryManager(waitLastKnownERL=true) >>> > > >> > > > // >>> > > >> > > > > > > must >>> > > >> > > > > > > > > wait >>> > > >> > > > > > > > > >> > for >>> > > >> > > > > > > > > >> > > at least last known ELR >>> > > >> > > > > > > > > >> > > } else { >>> > > >> > > > > > > > > >> > > if (strategy == Aggressive) >>> > > >> > > > > > > > > >> > > choose the last known leader if that >>> is >>> > > >> > available, >>> > > >> > > > or >>> > > >> > > > > a >>> > > >> > > > > > > > random >>> > > >> > > > > > > > > >> > leader >>> > > >> > > > > > > > > >> > > if not) >>> > > >> > > > > > > > > >> > > else >>> > > >> > > > > > > > > >> > > wait for last known leader to get >>> back >>> > > >> > > > > > > > > >> > > } >>> > > >> > > > > > > > > >> > > } >>> > > >> > > > > > > > > >> > > >>> > > >> > > > > > > > > >> > > The idea is that the Aggressive strategy >>> would >>> > > >> kick in >>> > > >> > > as >>> > > >> > > > > soon >>> > > >> > > > > > > as >>> > > >> > > > > > > > we >>> > > >> > > > > > > > > >> lost >>> > > >> > > > > > > > > >> > > the leader and would pick a leader from >>> whoever >>> > > is >>> > > >> > > > > available; >>> > > >> > > > > > > but >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > > Balanced will only kick in when ELR is >>> empty >>> > and >>> > > >> will >>> > > >> > > wait >>> > > >> > > > > for >>> > > >> > > > > > > the >>> > > >> > > > > > > > > >> > brokers >>> > > >> > > > > > > > > >> > > that likely have most data to be available. >>> > > >> > > > > > > > > >> > > >>> > > >> > > > > > > > > >> > > On Tue, Oct 3, 2023 at 3:04 PM Colin >>> McCabe < >>> > > >> > > > > > cmcc...@apache.org >>> > > >> > > > > > > > >>> > > >> > > > > > > > > >> wrote: >>> > > >> > > > > > > > > >> > > >>> > > >> > > > > > > > > >> > > > On Tue, Oct 3, 2023, at 10:49, Jun Rao >>> wrote: >>> > > >> > > > > > > > > >> > > > > Hi, Calvin, >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > Thanks for the update KIP. A few more >>> > > comments. >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > 41. Why would a user choose the option >>> to >>> > > >> select a >>> > > >> > > > > random >>> > > >> > > > > > > > > replica >>> > > >> > > > > > > > > >> as >>> > > >> > > > > > > > > >> > > the >>> > > >> > > > > > > > > >> > > > > leader instead of using >>> > > >> > > > > > unclean.recovery.strateg=Aggressive? >>> > > >> > > > > > > > It >>> > > >> > > > > > > > > >> seems >>> > > >> > > > > > > > > >> > > > that >>> > > >> > > > > > > > > >> > > > > the latter is strictly better? If >>> that's >>> > not >>> > > >> the >>> > > >> > > case, >>> > > >> > > > > > could >>> > > >> > > > > > > > we >>> > > >> > > > > > > > > >> fold >>> > > >> > > > > > > > > >> > > this >>> > > >> > > > > > > > > >> > > > > option under unclean.recovery.strategy >>> > > instead >>> > > >> of >>> > > >> > > > > > > introducing >>> > > >> > > > > > > > a >>> > > >> > > > > > > > > >> > > separate >>> > > >> > > > > > > > > >> > > > > config? >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > Hi Jun, >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > I thought the flow of control was: >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > If there is no leader for the partition { >>> > > >> > > > > > > > > >> > > > If (there are unfenced ELR members) { >>> > > >> > > > > > > > > >> > > > choose_an_unfenced_ELR_member >>> > > >> > > > > > > > > >> > > > } else if (there are fenced ELR >>> members AND >>> > > >> > > > > > > > > strategy=Aggressive) { >>> > > >> > > > > > > > > >> > > > do_unclean_recovery >>> > > >> > > > > > > > > >> > > > } else if (there are no ELR members AND >>> > > >> strategy >>> > > >> > != >>> > > >> > > > > None) >>> > > >> > > > > > { >>> > > >> > > > > > > > > >> > > > do_unclean_recovery >>> > > >> > > > > > > > > >> > > > } else { >>> > > >> > > > > > > > > >> > > > do nothing about the missing leader >>> > > >> > > > > > > > > >> > > > } >>> > > >> > > > > > > > > >> > > > } >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > do_unclean_recovery() { >>> > > >> > > > > > > > > >> > > > if (unclean.recovery.manager.enabled) >>> { >>> > > >> > > > > > > > > >> > > > use UncleanRecoveryManager >>> > > >> > > > > > > > > >> > > > } else { >>> > > >> > > > > > > > > >> > > > choose the last known leader if that >>> is >>> > > >> > available, >>> > > >> > > > or >>> > > >> > > > > a >>> > > >> > > > > > > > random >>> > > >> > > > > > > > > >> > leader >>> > > >> > > > > > > > > >> > > > if not) >>> > > >> > > > > > > > > >> > > > } >>> > > >> > > > > > > > > >> > > > } >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > However, I think this could be clarified, >>> > > >> especially >>> > > >> > > the >>> > > >> > > > > > > > behavior >>> > > >> > > > > > > > > >> when >>> > > >> > > > > > > > > >> > > > unclean.recovery.manager.enabled=false. >>> > > >> Inuitively >>> > > >> > the >>> > > >> > > > > goal >>> > > >> > > > > > > for >>> > > >> > > > > > > > > >> > > > unclean.recovery.manager.enabled=false >>> is to >>> > be >>> > > >> "the >>> > > >> > > > same >>> > > >> > > > > as >>> > > >> > > > > > > > now, >>> > > >> > > > > > > > > >> > mostly" >>> > > >> > > > > > > > > >> > > > but it's very underspecified in the KIP, >>> I >>> > > agree. >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > 50. ElectLeadersRequest: "If more than >>> 20 >>> > > >> topics >>> > > >> > are >>> > > >> > > > > > > included, >>> > > >> > > > > > > > > >> only >>> > > >> > > > > > > > > >> > the >>> > > >> > > > > > > > > >> > > > > first 20 will be served. Others will be >>> > > >> returned >>> > > >> > > with >>> > > >> > > > > > > > > >> > DesiredLeaders." >>> > > >> > > > > > > > > >> > > > Hmm, >>> > > >> > > > > > > > > >> > > > > not sure that I understand this. >>> > > >> > > ElectLeadersResponse >>> > > >> > > > > > > doesn't >>> > > >> > > > > > > > > >> have a >>> > > >> > > > > > > > > >> > > > > DesiredLeaders field. >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > 51. GetReplicaLogInfo: "If more than >>> 2000 >>> > > >> > partitions >>> > > >> > > > are >>> > > >> > > > > > > > > included, >>> > > >> > > > > > > > > >> > only >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > > first 2000 will be served" Do we >>> return an >>> > > >> error >>> > > >> > for >>> > > >> > > > the >>> > > >> > > > > > > > > remaining >>> > > >> > > > > > > > > >> > > > > partitions? Actually, should we >>> include an >>> > > >> > errorCode >>> > > >> > > > > field >>> > > >> > > > > > > at >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > > > partition >>> > > >> > > > > > > > > >> > > > > level in GetReplicaLogInfoResponse to >>> cover >>> > > >> > > > non-existing >>> > > >> > > > > > > > > >> partitions >>> > > >> > > > > > > > > >> > and >>> > > >> > > > > > > > > >> > > > no >>> > > >> > > > > > > > > >> > > > > authorization, etc? >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > 52. The entry should matches => The >>> entry >>> > > >> should >>> > > >> > > match >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > 53. ElectLeadersRequest.DesiredLeaders: >>> > > Should >>> > > >> it >>> > > >> > be >>> > > >> > > > > > > nullable >>> > > >> > > > > > > > > >> since a >>> > > >> > > > > > > > > >> > > > user >>> > > >> > > > > > > > > >> > > > > may not specify DesiredLeaders? >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > 54. Downgrade: Is that indeed >>> possible? I >>> > > >> thought >>> > > >> > > > > earlier >>> > > >> > > > > > > you >>> > > >> > > > > > > > > said >>> > > >> > > > > > > > > >> > that >>> > > >> > > > > > > > > >> > > > > once the new version of the records >>> are in >>> > > the >>> > > >> > > > metadata >>> > > >> > > > > > log, >>> > > >> > > > > > > > one >>> > > >> > > > > > > > > >> > can't >>> > > >> > > > > > > > > >> > > > > downgrade since the old broker doesn't >>> know >>> > > >> how to >>> > > >> > > > parse >>> > > >> > > > > > the >>> > > >> > > > > > > > new >>> > > >> > > > > > > > > >> > > version >>> > > >> > > > > > > > > >> > > > of >>> > > >> > > > > > > > > >> > > > > the metadata records? >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > MetadataVersion downgrade is currently >>> broken >>> > > >> but we >>> > > >> > > > have >>> > > >> > > > > > > fixing >>> > > >> > > > > > > > > it >>> > > >> > > > > > > > > >> on >>> > > >> > > > > > > > > >> > > our >>> > > >> > > > > > > > > >> > > > plate for Kafka 3.7. >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > The way downgrade works is that "new >>> > features" >>> > > >> are >>> > > >> > > > > dropped, >>> > > >> > > > > > > > > leaving >>> > > >> > > > > > > > > >> > only >>> > > >> > > > > > > > > >> > > > the old ones. >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > > 55. CleanShutdownFile: Should we add a >>> > > version >>> > > >> > field >>> > > >> > > > for >>> > > >> > > > > > > > future >>> > > >> > > > > > > > > >> > > > extension? >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > 56. Config changes are public facing. >>> Could >>> > > we >>> > > >> > have >>> > > >> > > a >>> > > >> > > > > > > separate >>> > > >> > > > > > > > > >> > section >>> > > >> > > > > > > > > >> > > to >>> > > >> > > > > > > > > >> > > > > document all the config changes? >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > +1. A separate section for this would be >>> > good. >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > best, >>> > > >> > > > > > > > > >> > > > Colin >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > Thanks, >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > Jun >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > > On Mon, Sep 25, 2023 at 4:29 PM Calvin >>> Liu >>> > > >> > > > > > > > > >> > <ca...@confluent.io.invalid >>> > > >> > > > > > > > > >> > > > >>> > > >> > > > > > > > > >> > > > > wrote: >>> > > >> > > > > > > > > >> > > > > >>> > > >> > > > > > > > > >> > > > >> Hi Jun >>> > > >> > > > > > > > > >> > > > >> Thanks for the comments. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 40. If we change to None, it is not >>> > > guaranteed >>> > > >> > for >>> > > >> > > no >>> > > >> > > > > > data >>> > > >> > > > > > > > > loss. >>> > > >> > > > > > > > > >> For >>> > > >> > > > > > > > > >> > > > users >>> > > >> > > > > > > > > >> > > > >> who are not able to validate the data >>> with >>> > > >> > external >>> > > >> > > > > > > > resources, >>> > > >> > > > > > > > > >> > manual >>> > > >> > > > > > > > > >> > > > >> intervention does not give a better >>> result >>> > > >> but a >>> > > >> > > loss >>> > > >> > > > > of >>> > > >> > > > > > > > > >> > availability. >>> > > >> > > > > > > > > >> > > > So >>> > > >> > > > > > > > > >> > > > >> practically speaking, the Balance mode >>> > would >>> > > >> be a >>> > > >> > > > > better >>> > > >> > > > > > > > > default >>> > > >> > > > > > > > > >> > > value. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 41. No, it represents how we want to >>> do >>> > the >>> > > >> > unclean >>> > > >> > > > > > leader >>> > > >> > > > > > > > > >> election. >>> > > >> > > > > > > > > >> > > If >>> > > >> > > > > > > > > >> > > > it >>> > > >> > > > > > > > > >> > > > >> is false, the unclean leader election >>> will >>> > > be >>> > > >> the >>> > > >> > > old >>> > > >> > > > > > > random >>> > > >> > > > > > > > > way. >>> > > >> > > > > > > > > >> > > > >> Otherwise, the unclean recovery will >>> be >>> > > used. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 42. Good catch. Updated. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 43. Only the first 20 topics will be >>> > served. >>> > > >> > Others >>> > > >> > > > > will >>> > > >> > > > > > be >>> > > >> > > > > > > > > >> returned >>> > > >> > > > > > > > > >> > > > with >>> > > >> > > > > > > > > >> > > > >> InvalidRequestError >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 44. The order matters. The desired >>> leader >>> > > >> entries >>> > > >> > > > match >>> > > >> > > > > > > with >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > topic >>> > > >> > > > > > > > > >> > > > >> partition list by the index. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 45. Thanks! Updated. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 46. Good advice! Updated. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 47.1, updated the comment. Basically >>> it >>> > will >>> > > >> > elect >>> > > >> > > > the >>> > > >> > > > > > > > replica >>> > > >> > > > > > > > > in >>> > > >> > > > > > > > > >> > the >>> > > >> > > > > > > > > >> > > > >> desiredLeader field to be the leader >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 47.2 We can let the admin client do >>> the >>> > > >> > conversion. >>> > > >> > > > > Using >>> > > >> > > > > > > the >>> > > >> > > > > > > > > >> > > > desiredLeader >>> > > >> > > > > > > > > >> > > > >> field in the json format seems easier >>> for >>> > > >> users. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 48. Once the MV version is >>> downgraded, all >>> > > the >>> > > >> > ELR >>> > > >> > > > > > related >>> > > >> > > > > > > > > fields >>> > > >> > > > > > > > > >> > will >>> > > >> > > > > > > > > >> > > > be >>> > > >> > > > > > > > > >> > > > >> removed on the next partition change. >>> The >>> > > >> > > controller >>> > > >> > > > > will >>> > > >> > > > > > > > also >>> > > >> > > > > > > > > >> > ignore >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> ELR fields. Updated the KIP. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> 49. Yes, it would be >>> deprecated/removed. >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> On Mon, Sep 25, 2023 at 3:49 PM Jun >>> Rao >>> > > >> > > > > > > > > <j...@confluent.io.invalid >>> > > >> > > > > > > > > >> > >>> > > >> > > > > > > > > >> > > > wrote: >>> > > >> > > > > > > > > >> > > > >> >>> > > >> > > > > > > > > >> > > > >> > Hi, Calvin, >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > Thanks for the updated KIP. Made >>> another >>> > > >> pass. >>> > > >> > A >>> > > >> > > > few >>> > > >> > > > > > more >>> > > >> > > > > > > > > >> comments >>> > > >> > > > > > > > > >> > > > below. >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 40. >>> unclean.leader.election.enable.false >>> > > -> >>> > > >> > > > > > > > > >> > > > >> > unclean.recovery.strategy.Balanced: >>> The >>> > > >> > Balanced >>> > > >> > > > mode >>> > > >> > > > > > > could >>> > > >> > > > > > > > > >> still >>> > > >> > > > > > > > > >> > > > lead to >>> > > >> > > > > > > > > >> > > > >> > data loss. So, I am wondering if >>> > > >> > > > > > > > > >> > > unclean.leader.election.enable.false >>> > > >> > > > > > > > > >> > > > >> > should map to None? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 41. >>> unclean.recovery.manager.enabled: I >>> > am >>> > > >> not >>> > > >> > > sure >>> > > >> > > > > why >>> > > >> > > > > > > we >>> > > >> > > > > > > > > >> > introduce >>> > > >> > > > > > > > > >> > > > this >>> > > >> > > > > > > > > >> > > > >> > additional config. Is it the same as >>> > > >> > > > > > > > > >> > unclean.recovery.strategy=None? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 42. >>> > > >> > > > DescribeTopicResponse.TopicAuthorizedOperations: >>> > > >> > > > > > > Should >>> > > >> > > > > > > > > >> this >>> > > >> > > > > > > > > >> > be >>> > > >> > > > > > > > > >> > > at >>> > > >> > > > > > > > > >> > > > >> the >>> > > >> > > > > > > > > >> > > > >> > topic level? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 43. "Limit: 20 topics max per >>> request": >>> > > >> Could >>> > > >> > we >>> > > >> > > > > > describe >>> > > >> > > > > > > > > what >>> > > >> > > > > > > > > >> > > > happens if >>> > > >> > > > > > > > > >> > > > >> > the request includes more than 20 >>> > topics? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 44. >>> ElectLeadersRequest.DesiredLeaders: >>> > > >> Could >>> > > >> > we >>> > > >> > > > > > describe >>> > > >> > > > > > > > > >> whether >>> > > >> > > > > > > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > ordering matters? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 45. >>> GetReplicaLogInfo.TopicPartitions: >>> > > >> "about": >>> > > >> > > > "The >>> > > >> > > > > > > topic >>> > > >> > > > > > > > > >> > > partitions >>> > > >> > > > > > > > > >> > > > to >>> > > >> > > > > > > > > >> > > > >> > elect leaders.": The description in >>> > > "about" >>> > > >> is >>> > > >> > > > > > incorrect. >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 46. GetReplicaLogInfoResponse: >>> Should we >>> > > >> nest >>> > > >> > > > > > partitions >>> > > >> > > > > > > > > under >>> > > >> > > > > > > > > >> > > > topicId to >>> > > >> > > > > > > > > >> > > > >> > be consistent with other types of >>> > > responses? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 47. kafka-leader-election.sh: >>> > > >> > > > > > > > > >> > > > >> > 47.1 Could we explain DESIGNATION? >>> > > >> > > > > > > > > >> > > > >> > 47.2 desiredLeader: Should it be a >>> list >>> > to >>> > > >> > match >>> > > >> > > > the >>> > > >> > > > > > > field >>> > > >> > > > > > > > in >>> > > >> > > > > > > > > >> > > > >> > ElectLeadersRequest? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 48. We could add a section on >>> downgrade? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > 49. LastKnownLeader: This seems only >>> > > needed >>> > > >> in >>> > > >> > > the >>> > > >> > > > > > first >>> > > >> > > > > > > > > phase >>> > > >> > > > > > > > > >> of >>> > > >> > > > > > > > > >> > > > >> > delivering ELR. Will it be removed >>> when >>> > > the >>> > > >> > > > complete >>> > > >> > > > > > KIP >>> > > >> > > > > > > is >>> > > >> > > > > > > > > >> > > delivered? >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > Thanks, >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > Jun >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > On Tue, Sep 19, 2023 at 1:30 PM >>> Colin >>> > > >> McCabe < >>> > > >> > > > > > > > > >> cmcc...@apache.org> >>> > > >> > > > > > > > > >> > > > wrote: >>> > > >> > > > > > > > > >> > > > >> > >>> > > >> > > > > > > > > >> > > > >> > > Hi Calvin, >>> > > >> > > > > > > > > >> > > > >> > > >>> > > >> > > > > > > > > >> > > > >> > > Thanks for the explanations. I >>> like >>> > the >>> > > >> idea >>> > > >> > of >>> > > >> > > > > using >>> > > >> > > > > > > > none, >>> > > >> > > > > > > > > >> > > > balanced, >>> > > >> > > > > > > > > >> > > > >> > > aggressive. We also had an offline >>> > > >> discussion >>> > > >> > > > about >>> > > >> > > > > > why >>> > > >> > > > > > > > it >>> > > >> > > > > > > > > is >>> > > >> > > > > > > > > >> > good >>> > > >> > > > > > > > > >> > > > to >>> > > >> > > > > > > > > >> > > > >> > use a >>> > > >> > > > > > > > > >> > > > >> > > new config key (basically, so >>> that we >>> > > can >>> > > >> > > > deprecate >>> > > >> > > > > > the >>> > > >> > > > > > > > old >>> > > >> > > > > > > > > >> one >>> > > >> > > > > > > > > >> > > > which >>> > > >> > > > > > > > > >> > > > >> had >>> > > >> > > > > > > > > >> > > > >> > > only false/true values in 4.0) >>> With >>> > > these >>> > > >> > > > changes, >>> > > >> > > > > I >>> > > >> > > > > > am >>> > > >> > > > > > > > +1. >>> > > >> > > > > > > > > >> > > > >> > > >>> > > >> > > > > > > > > >> > > > >> > > best, >>> > > >> > > > > > > > > >> > > > >> > > Colin >>> > > >> > > > > > > > > >> > > > >> > > >>> > > >> > > > > > > > > >> > > > >> > > On Mon, Sep 18, 2023, at 15:54, >>> Calvin >>> > > Liu >>> > > >> > > wrote: >>> > > >> > > > > > > > > >> > > > >> > > > Hi Colin, >>> > > >> > > > > > > > > >> > > > >> > > > Also, can we deprecate >>> > > >> > > > > > unclean.leader.election.enable >>> > > >> > > > > > > > in >>> > > >> > > > > > > > > >> 4.0? >>> > > >> > > > > > > > > >> > > > Before >>> > > >> > > > > > > > > >> > > > >> > > that, >>> > > >> > > > > > > > > >> > > > >> > > > we can have both the config >>> > > >> > > > > > unclean.recovery.strategy >>> > > >> > > > > > > > and >>> > > >> > > > > > > > > >> > > > >> > > > unclean.leader.election.enable >>> > > >> > > > > > > > > >> > > > >> > > > and using the >>> > unclean.recovery.Enabled >>> > > >> to >>> > > >> > > > > determine >>> > > >> > > > > > > > which >>> > > >> > > > > > > > > >> > config >>> > > >> > > > > > > > > >> > > > to >>> > > >> > > > > > > > > >> > > > >> use >>> > > >> > > > > > > > > >> > > > >> > > > during the unclean leader >>> election. >>> > > >> > > > > > > > > >> > > > >> > > > >>> > > >> > > > > > > > > >> > > > >> > > > On Mon, Sep 18, 2023 at 3:51 PM >>> > Calvin >>> > > >> Liu >>> > > >> > < >>> > > >> > > > > > > > > >> > ca...@confluent.io> >>> > > >> > > > > > > > > >> > > > >> wrote: >>> > > >> > > > > > > > > >> > > > >> > > > >>> > > >> > > > > > > > > >> > > > >> > > >> Hi Colin, >>> > > >> > > > > > > > > >> > > > >> > > >> For the >>> unclean.recovery.strategy >>> > > >> config >>> > > >> > > name, >>> > > >> > > > > how >>> > > >> > > > > > > > about >>> > > >> > > > > > > > > >> we >>> > > >> > > > > > > > > >> > use >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> > > >> following >>> > > >> > > > > > > > > >> > > > >> > > >> None. It basically means no >>> unclean >>> > > >> > recovery >>> > > >> > > > > will >>> > > >> > > > > > be >>> > > >> > > > > > > > > >> > performed. >>> > > >> > > > > > > > > >> > > > >> > > >> Aggressive. It means >>> availability >>> > > goes >>> > > >> > > first. >>> > > >> > > > > > > Whenever >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > > > partition >>> > > >> > > > > > > > > >> > > > >> > > can't >>> > > >> > > > > > > > > >> > > > >> > > >> elect a durable replica, the >>> > > controller >>> > > >> > will >>> > > >> > > > try >>> > > >> > > > > > the >>> > > >> > > > > > > > > >> unclean >>> > > >> > > > > > > > > >> > > > >> recovery. >>> > > >> > > > > > > > > >> > > > >> > > >> Balanced. It is the balance >>> point >>> > of >>> > > >> the >>> > > >> > > > > > > availability >>> > > >> > > > > > > > > >> > > > >> > first(Aggressive) >>> > > >> > > > > > > > > >> > > > >> > > >> and least availability(None). >>> The >>> > > >> > controller >>> > > >> > > > > > > performs >>> > > >> > > > > > > > > >> unclean >>> > > >> > > > > > > > > >> > > > >> recovery >>> > > >> > > > > > > > > >> > > > >> > > when >>> > > >> > > > > > > > > >> > > > >> > > >> both ISR and ELR are empty. >>> > > >> > > > > > > > > >> > > > >> > > >> >>> > > >> > > > > > > > > >> > > > >> > > >> >>> > > >> > > > > > > > > >> > > > >> > > >> On Fri, Sep 15, 2023 at >>> 11:42 AM >>> > > Calvin >>> > > >> > Liu >>> > > >> > > < >>> > > >> > > > > > > > > >> > > ca...@confluent.io> >>> > > >> > > > > > > > > >> > > > >> > wrote: >>> > > >> > > > > > > > > >> > > > >> > > >> >>> > > >> > > > > > > > > >> > > > >> > > >>> Hi Colin, >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> > So, the proposal is that if >>> > > someone >>> > > >> > sets >>> > > >> > > > > > > > > >> > > > >> > > "unclean.leader.election.enable >>> > > >> > > > > > > > > >> > > > >> > > >>> = true"... >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> The idea is to use one of the >>> > > >> > > > > > > > > >> unclean.leader.election.enable >>> > > >> > > > > > > > > >> > > and >>> > > >> > > > > > > > > >> > > > >> > > >>> unclean.recovery.strategy >>> based on >>> > > the >>> > > >> > > > > > > > > >> > > > unclean.recovery.Enabled. A >>> > > >> > > > > > > > > >> > > > >> > > possible >>> > > >> > > > > > > > > >> > > > >> > > >>> version can be >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> If unclean.recovery.Enabled: >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> { >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> Check >>> unclean.recovery.strategy. >>> > If >>> > > >> set, >>> > > >> > > use >>> > > >> > > > > it. >>> > > >> > > > > > > > > >> Otherwise, >>> > > >> > > > > > > > > >> > > > check >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> unclean.leader.election.enable and >>> > > >> > > translate >>> > > >> > > > it >>> > > >> > > > > > to >>> > > >> > > > > > > > > >> > > > >> > > >>> unclean.recovery.strategy. >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> } else { >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> Use >>> unclean.leader.election.enable >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> } >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> —-------- >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >The configuration key should >>> be >>> > > >> > > > > > > > > >> > > > >> "unclean.recovery.manager.enabled", >>> > > >> > > > > > > > > >> > > > >> > > >>> right? >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> I think we have two ways of >>> > > choosing a >>> > > >> > > leader >>> > > >> > > > > > > > > uncleanly, >>> > > >> > > > > > > > > >> > > unclean >>> > > >> > > > > > > > > >> > > > >> > leader >>> > > >> > > > > > > > > >> > > > >> > > >>> election and unclean >>> recovery(log >>> > > >> > > inspection) >>> > > >> > > > > and >>> > > >> > > > > > > we >>> > > >> > > > > > > > > try >>> > > >> > > > > > > > > >> to >>> > > >> > > > > > > > > >> > > > switch >>> > > >> > > > > > > > > >> > > > >> > > between >>> > > >> > > > > > > > > >> > > > >> > > >>> them. >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> Do you mean we want to >>> develop two >>> > > >> ways >>> > > >> > of >>> > > >> > > > > > > performing >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > > > unclean >>> > > >> > > > > > > > > >> > > > >> > > >>> recovery and one of them is >>> using >>> > > >> > “unclean >>> > > >> > > > > > recovery >>> > > >> > > > > > > > > >> > manager”? >>> > > >> > > > > > > > > >> > > I >>> > > >> > > > > > > > > >> > > > >> guess >>> > > >> > > > > > > > > >> > > > >> > > we >>> > > >> > > > > > > > > >> > > > >> > > >>> haven’t discussed the second >>> way. >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> —------- >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >How do these 4 levels of >>> > overrides >>> > > >> > > interact >>> > > >> > > > > with >>> > > >> > > > > > > > your >>> > > >> > > > > > > > > >> new >>> > > >> > > > > > > > > >> > > > >> > > >>> configurations? >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> I do notice in the Kraft >>> > controller >>> > > >> code, >>> > > >> > > the >>> > > >> > > > > > > method >>> > > >> > > > > > > > to >>> > > >> > > > > > > > > >> > check >>> > > >> > > > > > > > > >> > > > >> whether >>> > > >> > > > > > > > > >> > > > >> > > >>> perform unclean leader >>> election is >>> > > >> hard >>> > > >> > > coded >>> > > >> > > > > to >>> > > >> > > > > > > > false >>> > > >> > > > > > > > > >> since >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > 2021(uncleanLeaderElectionEnabledForTopic). >>> > > >> > > > > Isn’t >>> > > >> > > > > > > it >>> > > >> > > > > > > > a >>> > > >> > > > > > > > > >> good >>> > > >> > > > > > > > > >> > > > chance >>> > > >> > > > > > > > > >> > > > >> to >>> > > >> > > > > > > > > >> > > > >> > > >>> completely deprecate the >>> > > >> > > > > > > > > unclean.leader.election.enable? >>> > > >> > > > > > > > > >> We >>> > > >> > > > > > > > > >> > > > don’t >>> > > >> > > > > > > > > >> > > > >> > even >>> > > >> > > > > > > > > >> > > > >> > > have >>> > > >> > > > > > > > > >> > > > >> > > >>> to worry about the config >>> > > conversion. >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> On the other hand, whatever >>> the >>> > > >> override >>> > > >> > > is, >>> > > >> > > > as >>> > > >> > > > > > > long >>> > > >> > > > > > > > as >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > > >> > controller >>> > > >> > > > > > > > > >> > > > >> > > >>> can have the final effective >>> > > >> > > > > > > > > >> unclean.leader.election.enable, >>> > > >> > > > > > > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > topic >>> > > >> > > > > > > > > >> > > > >> > > >>> level config >>> > > >> unclean.recovery.strategy, >>> > > >> > the >>> > > >> > > > > > cluster >>> > > >> > > > > > > > > level >>> > > >> > > > > > > > > >> > > config >>> > > >> > > > > > > > > >> > > > >> > > >>> unclean.recovery.Enabled, the >>> > > >> controller >>> > > >> > > can >>> > > >> > > > > > > > calculate >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > > correct >>> > > >> > > > > > > > > >> > > > >> > > methods >>> > > >> > > > > > > > > >> > > > >> > > >>> to use right? >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>> On Fri, Sep 15, 2023 at >>> 10:02 AM >>> > > Colin >>> > > >> > > > McCabe < >>> > > >> > > > > > > > > >> > > > cmcc...@apache.org> >>> > > >> > > > > > > > > >> > > > >> > > wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> On Thu, Sep 14, 2023, at >>> 22:23, >>> > > >> Calvin >>> > > >> > Liu >>> > > >> > > > > > wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>>> > Hi Colin >>> > > >> > > > > > > > > >> > > > >> > > >>>> > 1. I think using the new >>> config >>> > > >> name >>> > > >> > is >>> > > >> > > > more >>> > > >> > > > > > > > clear. >>> > > >> > > > > > > > > >> > > > >> > > >>>> > a. The unclean >>> leader >>> > > >> election >>> > > >> > is >>> > > >> > > > > > > actually >>> > > >> > > > > > > > > >> removed >>> > > >> > > > > > > > > >> > > if >>> > > >> > > > > > > > > >> > > > >> > unclean >>> > > >> > > > > > > > > >> > > > >> > > >>>> > recovery is in use. >>> > > >> > > > > > > > > >> > > > >> > > >>>> > b. Using multiple >>> values >>> > > in >>> > > >> > > > > > > > > >> > > > >> unclean.leader.election.enable >>> > > >> > > > > > > > > >> > > > >> > is >>> > > >> > > > > > > > > >> > > > >> > > >>>> > confusing and it will be >>> more >>> > > >> > confusing >>> > > >> > > > > after >>> > > >> > > > > > > > people >>> > > >> > > > > > > > > >> > forget >>> > > >> > > > > > > > > >> > > > >> about >>> > > >> > > > > > > > > >> > > > >> > > this >>> > > >> > > > > > > > > >> > > > >> > > >>>> > discussion. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> Hi Calvin, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> So, the proposal is that if >>> > someone >>> > > >> sets >>> > > >> > > > > > > > > >> > > > >> > > "unclean.leader.election.enable >>> > > >> > > > > > > > > >> > > > >> > > >>>> = true" but then sets one of >>> your >>> > > new >>> > > >> > > > > > > > configurations, >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > > value of >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> unclean.leader.election.enable is >>> > > >> > ignored? >>> > > >> > > > > That >>> > > >> > > > > > > > seems >>> > > >> > > > > > > > > >> less >>> > > >> > > > > > > > > >> > > > clear >>> > > >> > > > > > > > > >> > > > >> to >>> > > >> > > > > > > > > >> > > > >> > > me, not >>> > > >> > > > > > > > > >> > > > >> > > >>>> more. Just in general, having >>> > > >> multiple >>> > > >> > > > > > > configuration >>> > > >> > > > > > > > > >> keys >>> > > >> > > > > > > > > >> > to >>> > > >> > > > > > > > > >> > > > >> control >>> > > >> > > > > > > > > >> > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> same thing confuses users. >>> > > Basically, >>> > > >> > they >>> > > >> > > > are >>> > > >> > > > > > > > sitting >>> > > >> > > > > > > > > >> at a >>> > > >> > > > > > > > > >> > > > giant >>> > > >> > > > > > > > > >> > > > >> > > control >>> > > >> > > > > > > > > >> > > > >> > > >>>> panel, and some of the >>> levers do >>> > > >> > nothing. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> > 2. Sorry I forgot to >>> mention in >>> > > the >>> > > >> > > > response >>> > > >> > > > > > > that >>> > > >> > > > > > > > I >>> > > >> > > > > > > > > >> did >>> > > >> > > > > > > > > >> > add >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> > unclean.recovery.Enabled >>> flag. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> The configuration key should >>> be >>> > > >> > > > > > > > > >> > > > >> "unclean.recovery.manager.enabled", >>> > > >> > > > > > > > > >> > > > >> > > >>>> right? Becuase we can do >>> "unclean >>> > > >> > > recovery" >>> > > >> > > > > > > without >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > > > manager. >>> > > >> > > > > > > > > >> > > > >> > > Disabling >>> > > >> > > > > > > > > >> > > > >> > > >>>> the manager just means we >>> use a >>> > > >> > different >>> > > >> > > > > > > mechanism >>> > > >> > > > > > > > > for >>> > > >> > > > > > > > > >> > > > recovery. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> > c. Maybe I >>> > underestimated >>> > > >> the >>> > > >> > > > > challenge >>> > > >> > > > > > > of >>> > > >> > > > > > > > > >> > replacing >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> config. Any >>> > > >> > > > > > > > > >> > > > >> > > >>>> > implementation problems >>> ahead? >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> There are four levels of >>> > overrides >>> > > >> for >>> > > >> > > > > > > > > >> > > > >> > unclean.leader.election.enable. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> 1. static configuration for >>> node. >>> > > >> > > > > > > > > >> > > > >> > > >>>> This goes in the >>> > configuration >>> > > >> file, >>> > > >> > > > > > typically >>> > > >> > > > > > > > > named >>> > > >> > > > > > > > > >> > > > >> > > >>>> server.properties >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> 2. dynamic configuration for >>> node >>> > > >> > default >>> > > >> > > > > > > > > >> > > > >> > > >>>> ConfigResource(type=BROKER, >>> > > >> name="") >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> 3. dynamic configuration for >>> node >>> > > >> > > > > > > > > >> > > > >> > > >>>> ConfigResource(type=BROKER, >>> > > >> > > > name=<controller >>> > > >> > > > > > > id>) >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> 4. dynamic configuration for >>> > topic >>> > > >> > > > > > > > > >> > > > >> > > >>>> ConfigResource(type=TOPIC, >>> > > >> > > > > name=<topic-name>) >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> How do these 4 levels of >>> > overrides >>> > > >> > > interact >>> > > >> > > > > with >>> > > >> > > > > > > > your >>> > > >> > > > > > > > > >> new >>> > > >> > > > > > > > > >> > > > >> > > >>>> configurations? If the new >>> > > >> > configurations >>> > > >> > > > > > dominate >>> > > >> > > > > > > > > over >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > old >>> > > >> > > > > > > > > >> > > > >> > ones, >>> > > >> > > > > > > > > >> > > > >> > > it >>> > > >> > > > > > > > > >> > > > >> > > >>>> seems like this will get a >>> lot >>> > more >>> > > >> > > > confusing >>> > > >> > > > > to >>> > > >> > > > > > > > > >> implement >>> > > >> > > > > > > > > >> > > (and >>> > > >> > > > > > > > > >> > > > >> also >>> > > >> > > > > > > > > >> > > > >> > > to >>> > > >> > > > > > > > > >> > > > >> > > >>>> use.) >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> Again, I'd recommend just >>> adding >>> > > some >>> > > >> > new >>> > > >> > > > > values >>> > > >> > > > > > > to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> unclean.leader.election.enable. >>> > > It's >>> > > >> > > simple >>> > > >> > > > > and >>> > > >> > > > > > > will >>> > > >> > > > > > > > > >> > prevent >>> > > >> > > > > > > > > >> > > > user >>> > > >> > > > > > > > > >> > > > >> > > confusion >>> > > >> > > > > > > > > >> > > > >> > > >>>> (as well as developer >>> confusion.) >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> best, >>> > > >> > > > > > > > > >> > > > >> > > >>>> Colin >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >>> > > >> > > > > > > > > >> > > > >> > > >>>> > 3. About the admin client, >>> I >>> > > >> > mentioned 3 >>> > > >> > > > > > changes >>> > > >> > > > > > > > in >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > > client. >>> > > >> > > > > > > > > >> > > > >> > > >>>> Anything >>> > > >> > > > > > > > > >> > > > >> > > >>>> > else I missed in the KIP? >>> > > >> > > > > > > > > >> > > > >> > > >>>> > a. The client will >>> switch >>> > > to >>> > > >> > using >>> > > >> > > > the >>> > > >> > > > > > new >>> > > >> > > > > > > > RPC >>> > > >> > > > > > > > > >> > > instead >>> > > >> > > > > > > > > >> > > > of >>> > > >> > > > > > > > > >> > > > >> > > >>>> > MetadataRequest for the >>> topics. >>> > > >> > > > > > > > > >> > > > >> > > >>>> > b. The >>> TopicPartitionInfo >>> > > >> used >>> > > >> > in >>> > > >> > > > > > > > > >> TopicDescription >>> > > >> > > > > > > > > >> > > > needs >>> > > >> > > > > > > > > >> > > > >> to >>> > > >> > > > > > > > > >> > > > >> > > add >>> > > >> > > > > > > > > >> > > > >> > > >>>> new >>> > > >> > > > > > > > > >> > > > >> > > >>>> > fields related to the ELR. >>> > > >> > > > > > > > > >> > > > >> > > >>>> > c. The outputs will >>> add >>> > the >>> > > >> ELR >>> > > >> > > > > related >>> > > >> > > > > > > > > fields. >>> > > >> > > > > > > > > >> > > > >> > > >>>> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> > On Thu, Sep 14, 2023 at >>> 9:19 PM >>> > > >> Colin >>> > > >> > > > > McCabe < >>> > > >> > > > > > > > > >> > > > >> cmcc...@apache.org> >>> > > >> > > > > > > > > >> > > > >> > > >>>> wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>>> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> Hi Calvin, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> Thanks for the changes. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> 1. Earlier I commented >>> that >>> > > >> creating >>> > > >> > > > > > > > > >> > > > >> "unclean.recovery.strategy " >>> > > >> > > > > > > > > >> > > > >> > > is >>> > > >> > > > > > > > > >> > > > >> > > >>>> not >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> necessary, and we can just >>> > reuse >>> > > >> the >>> > > >> > > > > existing >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > "unclean.leader.election.enable" >>> > > >> > > > > > configuration >>> > > >> > > > > > > > key. >>> > > >> > > > > > > > > >> > Let's >>> > > >> > > > > > > > > >> > > > >> discuss >>> > > >> > > > > > > > > >> > > > >> > > >>>> that. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> 2.I also don't understand >>> why >>> > > you >>> > > >> > > didn't >>> > > >> > > > > add >>> > > >> > > > > > a >>> > > >> > > > > > > > > >> > > > configuration to >>> > > >> > > > > > > > > >> > > > >> > > >>>> enable or >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> disable the Unclean >>> Recovery >>> > > >> Manager. >>> > > >> > > > This >>> > > >> > > > > > > seems >>> > > >> > > > > > > > > >> like a >>> > > >> > > > > > > > > >> > > very >>> > > >> > > > > > > > > >> > > > >> > simple >>> > > >> > > > > > > > > >> > > > >> > > >>>> way to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> handle the staging issue >>> which >>> > > we >>> > > >> > > > > discussed. >>> > > >> > > > > > > The >>> > > >> > > > > > > > > URM >>> > > >> > > > > > > > > >> can >>> > > >> > > > > > > > > >> > > > just >>> > > >> > > > > > > > > >> > > > >> be >>> > > >> > > > > > > > > >> > > > >> > > >>>> turned off >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> until it is production >>> ready. >>> > > >> Let's >>> > > >> > > > discuss >>> > > >> > > > > > > this. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> 3. You still need to >>> describe >>> > > the >>> > > >> > > changes >>> > > >> > > > > to >>> > > >> > > > > > > > > >> AdminClient >>> > > >> > > > > > > > > >> > > > that >>> > > >> > > > > > > > > >> > > > >> are >>> > > >> > > > > > > > > >> > > > >> > > >>>> needed >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> to use >>> DescribeTopicRequest. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> Keep at it. It's looking >>> > better. >>> > > >> :) >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> best, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> Colin >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> On Thu, Sep 14, 2023, at >>> > 11:03, >>> > > >> > Calvin >>> > > >> > > > Liu >>> > > >> > > > > > > wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > Hi Colin >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > Thanks for the comments! >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > I did the following >>> changes >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 1. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > Simplified the API >>> spec >>> > > >> section >>> > > >> > to >>> > > >> > > > > only >>> > > >> > > > > > > > > include >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > > diff. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 2. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > Reordered the HWM >>> > > requirement >>> > > >> > > > section. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 3. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > Removed the URM >>> > > >> implementation >>> > > >> > > > details >>> > > >> > > > > > to >>> > > >> > > > > > > > keep >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > > >> necessary >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > characteristics to >>> > perform >>> > > >> the >>> > > >> > > > unclean >>> > > >> > > > > > > > > recovery. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 1. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > When to perform >>> the >>> > > >> unclean >>> > > >> > > > > recovery >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 2. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > Under different >>> > config, >>> > > >> how >>> > > >> > the >>> > > >> > > > > > unclean >>> > > >> > > > > > > > > >> recovery >>> > > >> > > > > > > > > >> > > > finds >>> > > >> > > > > > > > > >> > > > >> > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> leader. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 3. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > How the config >>> > > >> > > > > > > > > unclean.leader.election.enable >>> > > >> > > > > > > > > >> > and >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > unclean.recovery.strategy >>> > > >> are >>> > > >> > > > > > converted >>> > > >> > > > > > > > > when >>> > > >> > > > > > > > > >> > users >>> > > >> > > > > > > > > >> > > > >> > > >>>> enable/disable >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > unclean recovery. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 4. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > More details about >>> how we >>> > > >> change >>> > > >> > > > admin >>> > > >> > > > > > > > client. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 5. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > API limits on the >>> > > >> > > > > > GetReplicaLogInfoRequest >>> > > >> > > > > > > > and >>> > > >> > > > > > > > > >> > > > >> > > >>>> DescribeTopicRequest. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 6. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > Two metrics added >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 1. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >>> > > >> > Kafka.controller.global_under_min_isr_partition_count >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > 2. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> >>> kafka.controller.unclean_recovery_finished_count >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > On Wed, Sep 13, 2023 at >>> > > 10:46 AM >>> > > >> > > Colin >>> > > >> > > > > > > McCabe < >>> > > >> > > > > > > > > >> > > > >> > > cmcc...@apache.org> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> On Tue, Sep 12, 2023, >>> at >>> > > 17:21, >>> > > >> > > Calvin >>> > > >> > > > > Liu >>> > > >> > > > > > > > > wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Hi Colin >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Thanks for the >>> comments! >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Hi Calvin, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Thanks again for the >>> KIP. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> One meta-comment: it's >>> > > usually >>> > > >> > > better >>> > > >> > > > to >>> > > >> > > > > > > just >>> > > >> > > > > > > > > do a >>> > > >> > > > > > > > > >> > diff >>> > > >> > > > > > > > > >> > > > on a >>> > > >> > > > > > > > > >> > > > >> > > >>>> message >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> spec >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> file or java file if >>> you're >>> > > >> > > including >>> > > >> > > > > > > changes >>> > > >> > > > > > > > to >>> > > >> > > > > > > > > >> it >>> > > >> > > > > > > > > >> > in >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> > KIP. >>> > > >> > > > > > > > > >> > > > >> > > >>>> This is >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> easier to read than >>> looking >>> > > for >>> > > >> > "new >>> > > >> > > > > > fields >>> > > >> > > > > > > > > begin" >>> > > >> > > > > > > > > >> > etc. >>> > > >> > > > > > > > > >> > > > in >>> > > >> > > > > > > > > >> > > > >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> text, and >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> gracefully handles the >>> case >>> > > >> where >>> > > >> > > > > existing >>> > > >> > > > > > > > > fields >>> > > >> > > > > > > > > >> > were >>> > > >> > > > > > > > > >> > > > >> > changed. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Rewrite the >>> Additional >>> > High >>> > > >> > > > Watermark >>> > > >> > > > > > > > > >> advancement >>> > > >> > > > > > > > > >> > > > >> > requirement >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > There was feedback on >>> > this >>> > > >> > section >>> > > >> > > > > that >>> > > >> > > > > > > some >>> > > >> > > > > > > > > >> > readers >>> > > >> > > > > > > > > >> > > > may >>> > > >> > > > > > > > > >> > > > >> not >>> > > >> > > > > > > > > >> > > > >> > > be >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> familiar >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > with HWM and >>> Ack=0,1,all >>> > > >> > requests. >>> > > >> > > > > This >>> > > >> > > > > > > can >>> > > >> > > > > > > > > help >>> > > >> > > > > > > > > >> > them >>> > > >> > > > > > > > > >> > > > >> > > understand >>> > > >> > > > > > > > > >> > > > >> > > >>>> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > proposal. I will >>> rewrite >>> > > this >>> > > >> > part >>> > > >> > > > for >>> > > >> > > > > > > more >>> > > >> > > > > > > > > >> > > > readability. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> To be clear, I wasn't >>> > > >> suggesting >>> > > >> > > > > dropping >>> > > >> > > > > > > > either >>> > > >> > > > > > > > > >> > > > section. I >>> > > >> > > > > > > > > >> > > > >> > > agree >>> > > >> > > > > > > > > >> > > > >> > > >>>> that >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> they add useful >>> > background. I >>> > > >> was >>> > > >> > > just >>> > > >> > > > > > > > > suggesting >>> > > >> > > > > > > > > >> > that >>> > > >> > > > > > > > > >> > > we >>> > > >> > > > > > > > > >> > > > >> > should >>> > > >> > > > > > > > > >> > > > >> > > >>>> discuss >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> the "acks" setting >>> AFTER >>> > > >> > discussing >>> > > >> > > > the >>> > > >> > > > > > new >>> > > >> > > > > > > > high >>> > > >> > > > > > > > > >> > > > watermark >>> > > >> > > > > > > > > >> > > > >> > > >>>> advancement >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> conditions. We also >>> should >>> > > >> discuss >>> > > >> > > > > acks=0. >>> > > >> > > > > > > > While >>> > > >> > > > > > > > > >> it >>> > > >> > > > > > > > > >> > > isn't >>> > > >> > > > > > > > > >> > > > >> > > >>>> conceptually >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> much >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> different than acks=1 >>> here, >>> > > its >>> > > >> > > > omission >>> > > >> > > > > > > from >>> > > >> > > > > > > > > this >>> > > >> > > > > > > > > >> > > > section >>> > > >> > > > > > > > > >> > > > >> is >>> > > >> > > > > > > > > >> > > > >> > > >>>> confusing. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Unclean recovery >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > The plan is to >>> replace >>> > the >>> > > >> > > > > > > > > >> > > > unclean.leader.election.enable >>> > > >> > > > > > > > > >> > > > >> > with >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > unclean.recovery.strategy. >>> > > If >>> > > >> > the >>> > > >> > > > > > Unclean >>> > > >> > > > > > > > > >> Recovery >>> > > >> > > > > > > > > >> > is >>> > > >> > > > > > > > > >> > > > >> > enabled >>> > > >> > > > > > > > > >> > > > >> > > >>>> then it >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> deals >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > with the three >>> options in >>> > > the >>> > > >> > > > > > > > > >> > > > unclean.recovery.strategy. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Let’s refine the >>> Unclean >>> > > >> > Recovery. >>> > > >> > > > We >>> > > >> > > > > > have >>> > > >> > > > > > > > > >> already >>> > > >> > > > > > > > > >> > > > taken a >>> > > >> > > > > > > > > >> > > > >> > > lot of >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > suggestions and I >>> hope to >>> > > >> > enhance >>> > > >> > > > the >>> > > >> > > > > > > > > >> durability of >>> > > >> > > > > > > > > >> > > > Kafka >>> > > >> > > > > > > > > >> > > > >> to >>> > > >> > > > > > > > > >> > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> next >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> level >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > with this KIP. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> I am OK with doing the >>> > > unclean >>> > > >> > > leader >>> > > >> > > > > > > recovery >>> > > >> > > > > > > > > >> > > > improvements >>> > > >> > > > > > > > > >> > > > >> in >>> > > >> > > > > > > > > >> > > > >> > > >>>> this KIP. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> However, I think we >>> need to >>> > > >> really >>> > > >> > > > work >>> > > >> > > > > on >>> > > >> > > > > > > the >>> > > >> > > > > > > > > >> > > > configuration >>> > > >> > > > > > > > > >> > > > >> > > >>>> settings. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Configuration >>> overrides are >>> > > >> often >>> > > >> > > > quite >>> > > >> > > > > > > messy. >>> > > >> > > > > > > > > For >>> > > >> > > > > > > > > >> > > > example, >>> > > >> > > > > > > > > >> > > > >> > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> cases >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> where we have >>> > log.roll.hours >>> > > >> and >>> > > >> > > > > > > > > >> log.roll.segment.ms >>> > > >> > > > > > > > > >> > , >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> > user >>> > > >> > > > > > > > > >> > > > >> > > >>>> has to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> remember which one >>> takes >>> > > >> > precedence, >>> > > >> > > > and >>> > > >> > > > > > it >>> > > >> > > > > > > is >>> > > >> > > > > > > > > not >>> > > >> > > > > > > > > >> > > > obvious. >>> > > >> > > > > > > > > >> > > > >> > So, >>> > > >> > > > > > > > > >> > > > >> > > >>>> rather >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> than >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> creating a new >>> > configuration, >>> > > >> why >>> > > >> > > not >>> > > >> > > > > add >>> > > >> > > > > > > > > >> additional >>> > > >> > > > > > > > > >> > > > values >>> > > >> > > > > > > > > >> > > > >> to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> "unclean.leader.election.enable"? >>> > > >> > I >>> > > >> > > > > think >>> > > >> > > > > > > this >>> > > >> > > > > > > > > >> will >>> > > >> > > > > > > > > >> > be >>> > > >> > > > > > > > > >> > > > >> simpler >>> > > >> > > > > > > > > >> > > > >> > > for >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> people >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> to understand, and >>> simpler >>> > in >>> > > >> the >>> > > >> > > code >>> > > >> > > > > as >>> > > >> > > > > > > > well. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> What if we continued >>> to use >>> > > >> > > > > > > > > >> > > > "unclean.leader.election.enable" >>> > > >> > > > > > > > > >> > > > >> > but >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> extended >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> it so that it took a >>> > string? >>> > > >> Then >>> > > >> > > the >>> > > >> > > > > > string >>> > > >> > > > > > > > > could >>> > > >> > > > > > > > > >> > have >>> > > >> > > > > > > > > >> > > > >> these >>> > > >> > > > > > > > > >> > > > >> > > >>>> values: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> never >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> never >>> automatically do >>> > an >>> > > >> > > unclean >>> > > >> > > > > > leader >>> > > >> > > > > > > > > >> election >>> > > >> > > > > > > > > >> > > > under >>> > > >> > > > > > > > > >> > > > >> > any >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> conditions >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> false / default >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> only do an unclean >>> > leader >>> > > >> > > election >>> > > >> > > > > if >>> > > >> > > > > > > > there >>> > > >> > > > > > > > > >> may >>> > > >> > > > > > > > > >> > be >>> > > >> > > > > > > > > >> > > > >> > possible >>> > > >> > > > > > > > > >> > > > >> > > >>>> data >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> loss >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> true / always >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> always do an >>> unclean >>> > > leader >>> > > >> > > > election >>> > > >> > > > > > if >>> > > >> > > > > > > we >>> > > >> > > > > > > > > >> can't >>> > > >> > > > > > > > > >> > > > >> > immediately >>> > > >> > > > > > > > > >> > > > >> > > >>>> elect a >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> leader >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> It's a bit awkward that >>> > false >>> > > >> maps >>> > > >> > > to >>> > > >> > > > > > > default >>> > > >> > > > > > > > > >> rather >>> > > >> > > > > > > > > >> > > > than to >>> > > >> > > > > > > > > >> > > > >> > > >>>> never. But >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> this awkwardness >>> exists if >>> > we >>> > > >> use >>> > > >> > > two >>> > > >> > > > > > > > different >>> > > >> > > > > > > > > >> > > > >> configuration >>> > > >> > > > > > > > > >> > > > >> > > keys >>> > > >> > > > > > > > > >> > > > >> > > >>>> as >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> well. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> The reason for the >>> > > awkwardness >>> > > >> is >>> > > >> > > that >>> > > >> > > > > we >>> > > >> > > > > > > > simply >>> > > >> > > > > > > > > >> > don't >>> > > >> > > > > > > > > >> > > > want >>> > > >> > > > > > > > > >> > > > >> > most >>> > > >> > > > > > > > > >> > > > >> > > >>>> of the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> people currently >>> setting >>> > > >> > > > > > > > > >> > > > >> unclean.leader.election.enable=false >>> > > >> > > > > > > > > >> > > > >> > to >>> > > >> > > > > > > > > >> > > > >> > > >>>> get the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> "never" behavior. We >>> have >>> > to >>> > > >> bite >>> > > >> > > that >>> > > >> > > > > > > bullet. >>> > > >> > > > > > > > > >> Better >>> > > >> > > > > > > > > >> > > to >>> > > >> > > > > > > > > >> > > > be >>> > > >> > > > > > > > > >> > > > >> > > clear >>> > > >> > > > > > > > > >> > > > >> > > >>>> and >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> explicit than hide it. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Another thing that's a >>> bit >>> > > >> awkward >>> > > >> > > is >>> > > >> > > > > > having >>> > > >> > > > > > > > two >>> > > >> > > > > > > > > >> > > > different >>> > > >> > > > > > > > > >> > > > >> > ways >>> > > >> > > > > > > > > >> > > > >> > > to >>> > > >> > > > > > > > > >> > > > >> > > >>>> do >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> unclean leader election >>> > > >> specified >>> > > >> > in >>> > > >> > > > the >>> > > >> > > > > > > KIP. >>> > > >> > > > > > > > > You >>> > > >> > > > > > > > > >> > > > descirbe >>> > > >> > > > > > > > > >> > > > >> two >>> > > >> > > > > > > > > >> > > > >> > > >>>> methods: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> simple "choose the last >>> > > leader" >>> > > >> > > > method, >>> > > >> > > > > > and >>> > > >> > > > > > > > the >>> > > >> > > > > > > > > >> > > "unclean >>> > > >> > > > > > > > > >> > > > >> > > recovery >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> manager" >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> method. I understand >>> why >>> > you >>> > > >> did >>> > > >> > it >>> > > >> > > > this >>> > > >> > > > > > way >>> > > >> > > > > > > > -- >>> > > >> > > > > > > > > >> > "choose >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> > last >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> leader" is >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> simple, and will help >>> us >>> > > >> deliver >>> > > >> > an >>> > > >> > > > > > > > > implementation >>> > > >> > > > > > > > > >> > > > quickly, >>> > > >> > > > > > > > > >> > > > >> > > while >>> > > >> > > > > > > > > >> > > > >> > > >>>> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> URM >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> is preferable in the >>> long >>> > > >> term. My >>> > > >> > > > > > > suggestion >>> > > >> > > > > > > > > >> here is >>> > > >> > > > > > > > > >> > > to >>> > > >> > > > > > > > > >> > > > >> > > separate >>> > > >> > > > > > > > > >> > > > >> > > >>>> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> decision of HOW to do >>> > unclean >>> > > >> > leader >>> > > >> > > > > > > election >>> > > >> > > > > > > > > from >>> > > >> > > > > > > > > >> > the >>> > > >> > > > > > > > > >> > > > >> > decision >>> > > >> > > > > > > > > >> > > > >> > > of >>> > > >> > > > > > > > > >> > > > >> > > >>>> WHEN >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> do it. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> So in other words, have >>> > > >> > > > > > > > > >> > > "unclean.leader.election.enable" >>> > > >> > > > > > > > > >> > > > >> > specify >>> > > >> > > > > > > > > >> > > > >> > > >>>> when we >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> do unclean leader >>> election, >>> > > and >>> > > >> > > have a >>> > > >> > > > > new >>> > > >> > > > > > > > > >> > > configuration >>> > > >> > > > > > > > > >> > > > >> like >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> "unclean.recovery.manager.enable" >>> > > >> > to >>> > > >> > > > > > > determine >>> > > >> > > > > > > > > if >>> > > >> > > > > > > > > >> we >>> > > >> > > > > > > > > >> > > use >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> > > URM. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Presumably the URM will >>> > take >>> > > >> some >>> > > >> > > time >>> > > >> > > > > to >>> > > >> > > > > > > get >>> > > >> > > > > > > > > >> fully >>> > > >> > > > > > > > > >> > > > stable, >>> > > >> > > > > > > > > >> > > > >> so >>> > > >> > > > > > > > > >> > > > >> > > >>>> this can >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> default to false for a >>> > while, >>> > > >> and >>> > > >> > we >>> > > >> > > > can >>> > > >> > > > > > > flip >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > > > default to >>> > > >> > > > > > > > > >> > > > >> > > true >>> > > >> > > > > > > > > >> > > > >> > > >>>> when >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> we >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> feel ready. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> The URM is somewhat >>> > > >> > under-described >>> > > >> > > > > here. >>> > > >> > > > > > I >>> > > >> > > > > > > > > think >>> > > >> > > > > > > > > >> we >>> > > >> > > > > > > > > >> > > > need a >>> > > >> > > > > > > > > >> > > > >> > few >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> configurations here >>> for it. >>> > > For >>> > > >> > > > example, >>> > > >> > > > > > we >>> > > >> > > > > > > > > need a >>> > > >> > > > > > > > > >> > > > >> > > configuration to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> specify >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> how long it should wait >>> > for a >>> > > >> > broker >>> > > >> > > > to >>> > > >> > > > > > > > respond >>> > > >> > > > > > > > > to >>> > > >> > > > > > > > > >> > its >>> > > >> > > > > > > > > >> > > > RPCs >>> > > >> > > > > > > > > >> > > > >> > > before >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> moving >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> on. We also need to >>> > > understand >>> > > >> how >>> > > >> > > the >>> > > >> > > > > URM >>> > > >> > > > > > > > > >> interacts >>> > > >> > > > > > > > > >> > > with >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > unclean.leader.election.enable=always. I >>> > > >> > > > > > > > assume >>> > > >> > > > > > > > > >> that >>> > > >> > > > > > > > > >> > > with >>> > > >> > > > > > > > > >> > > > >> > > "always" >>> > > >> > > > > > > > > >> > > > >> > > >>>> we >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> will >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> just unconditionally >>> use >>> > the >>> > > >> URM >>> > > >> > > > rather >>> > > >> > > > > > than >>> > > >> > > > > > > > > >> choosing >>> > > >> > > > > > > > > >> > > > >> > randomly. >>> > > >> > > > > > > > > >> > > > >> > > >>>> But this >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> should be spelled out >>> in >>> > the >>> > > >> KIP. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > DescribeTopicRequest >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > 1. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Yes, the plan is >>> to >>> > > >> replace >>> > > >> > the >>> > > >> > > > > > > > > >> MetadataRequest >>> > > >> > > > > > > > > >> > > with >>> > > >> > > > > > > > > >> > > > >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> DescribeTopicRequest >>> > for >>> > > >> the >>> > > >> > > > admin >>> > > >> > > > > > > > clients. >>> > > >> > > > > > > > > >> Will >>> > > >> > > > > > > > > >> > > > check >>> > > >> > > > > > > > > >> > > > >> > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> details. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Sounds good. But as I >>> said, >>> > > you >>> > > >> > need >>> > > >> > > > to >>> > > >> > > > > > > > specify >>> > > >> > > > > > > > > >> how >>> > > >> > > > > > > > > >> > > > >> > AdminClient >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> interacts >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> with the new request. >>> This >>> > > will >>> > > >> > > > involve >>> > > >> > > > > > > adding >>> > > >> > > > > > > > > >> some >>> > > >> > > > > > > > > >> > > > fields >>> > > >> > > > > > > > > >> > > > >> to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> TopicDescription.java. >>> And >>> > > you >>> > > >> > need >>> > > >> > > to >>> > > >> > > > > > > specify >>> > > >> > > > > > > > > the >>> > > >> > > > > > > > > >> > > > changes >>> > > >> > > > > > > > > >> > > > >> to >>> > > >> > > > > > > > > >> > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> kafka-topics.sh command >>> > line >>> > > >> tool. >>> > > >> > > > > > Otherwise >>> > > >> > > > > > > > we >>> > > >> > > > > > > > > >> > cannot >>> > > >> > > > > > > > > >> > > > use >>> > > >> > > > > > > > > >> > > > >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> tool to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> see >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> the new information. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> The new requests, >>> > > >> > > DescribeTopicRequest >>> > > >> > > > > and >>> > > >> > > > > > > > > >> > > > >> > > >>>> GetReplicaLogInfoRequest, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> need >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> to have limits placed >>> on >>> > them >>> > > >> so >>> > > >> > > that >>> > > >> > > > > > their >>> > > >> > > > > > > > size >>> > > >> > > > > > > > > >> > can't >>> > > >> > > > > > > > > >> > > be >>> > > >> > > > > > > > > >> > > > >> > > >>>> infinite. We >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> don't want to >>> propagate the >>> > > >> > current >>> > > >> > > > > > problems >>> > > >> > > > > > > > of >>> > > >> > > > > > > > > >> > > > >> > MetadataRequest, >>> > > >> > > > > > > > > >> > > > >> > > >>>> where >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> clients can request >>> massive >>> > > >> > > responses >>> > > >> > > > > that >>> > > >> > > > > > > can >>> > > >> > > > > > > > > >> mess >>> > > >> > > > > > > > > >> > up >>> > > >> > > > > > > > > >> > > > the >>> > > >> > > > > > > > > >> > > > >> JVM >>> > > >> > > > > > > > > >> > > > >> > > when >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> handled. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Adding limits is >>> simple for >>> > > >> > > > > > > > > >> GetReplicaLogInfoRequest >>> > > >> > > > > > > > > >> > -- >>> > > >> > > > > > > > > >> > > > we >>> > > >> > > > > > > > > >> > > > >> can >>> > > >> > > > > > > > > >> > > > >> > > >>>> just say >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> that only 2000 >>> partitions >>> > at >>> > > a >>> > > >> > time >>> > > >> > > > can >>> > > >> > > > > be >>> > > >> > > > > > > > > >> requested. >>> > > >> > > > > > > > > >> > > For >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> DescribeTopicRequest >>> we can >>> > > >> > probably >>> > > >> > > > > just >>> > > >> > > > > > > > limit >>> > > >> > > > > > > > > >> to 20 >>> > > >> > > > > > > > > >> > > > topics >>> > > >> > > > > > > > > >> > > > >> > or >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> something >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> like that, to avoid the >>> > > >> complexity >>> > > >> > > of >>> > > >> > > > > > doing >>> > > >> > > > > > > > > >> > pagination >>> > > >> > > > > > > > > >> > > in >>> > > >> > > > > > > > > >> > > > >> this >>> > > >> > > > > > > > > >> > > > >> > > KIP. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > 2. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > I can let the >>> broker >>> > > load >>> > > >> the >>> > > >> > > ELR >>> > > >> > > > > > info >>> > > >> > > > > > > so >>> > > >> > > > > > > > > >> that >>> > > >> > > > > > > > > >> > > they >>> > > >> > > > > > > > > >> > > > can >>> > > >> > > > > > > > > >> > > > >> > > serve >>> > > >> > > > > > > > > >> > > > >> > > >>>> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> DescribeTopicRequest >>> > as >>> > > >> well. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Yes, it's fine to add >>> to >>> > > >> > > > MetadataCache. >>> > > >> > > > > In >>> > > >> > > > > > > > fact, >>> > > >> > > > > > > > > >> > you'll >>> > > >> > > > > > > > > >> > > > be >>> > > >> > > > > > > > > >> > > > >> > > loading >>> > > >> > > > > > > > > >> > > > >> > > >>>> it >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> anyway once it's added >>> to >>> > > >> > > > > PartitionImage. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > 3. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Yeah, it does not >>> make >>> > > >> sense >>> > > >> > to >>> > > >> > > > > have >>> > > >> > > > > > > the >>> > > >> > > > > > > > > >> topic >>> > > >> > > > > > > > > >> > id >>> > > >> > > > > > > > > >> > > if >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> DescribeTopicRequest >>> > is >>> > > >> only >>> > > >> > > used >>> > > >> > > > > by >>> > > >> > > > > > > the >>> > > >> > > > > > > > > >> admin >>> > > >> > > > > > > > > >> > > > client. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> OK. That makes things >>> > > simpler. >>> > > >> We >>> > > >> > > can >>> > > >> > > > > > always >>> > > >> > > > > > > > > >> create a >>> > > >> > > > > > > > > >> > > new >>> > > >> > > > > > > > > >> > > > >> API >>> > > >> > > > > > > > > >> > > > >> > > later >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> (hopefully not in this >>> > KIP!) >>> > > to >>> > > >> > > query >>> > > >> > > > by >>> > > >> > > > > > > topic >>> > > >> > > > > > > > > ID. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Metrics >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > As for overall >>> cluster >>> > > health >>> > > >> > > > > metrics, I >>> > > >> > > > > > > > think >>> > > >> > > > > > > > > >> > > > >> under-min-ISR >>> > > >> > > > > > > > > >> > > > >> > > is >>> > > >> > > > > > > > > >> > > > >> > > >>>> still >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> a >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > useful one. ELR is >>> more >>> > > like >>> > > >> a >>> > > >> > > > safety >>> > > >> > > > > > > belt. >>> > > >> > > > > > > > > When >>> > > >> > > > > > > > > >> > the >>> > > >> > > > > > > > > >> > > > ELR >>> > > >> > > > > > > > > >> > > > >> is >>> > > >> > > > > > > > > >> > > > >> > > >>>> used, the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > cluster availability >>> has >>> > > >> already >>> > > >> > > > been >>> > > >> > > > > > > > > impacted. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Maybe we can have a >>> > metric >>> > > to >>> > > >> > > count >>> > > >> > > > > the >>> > > >> > > > > > > > > >> partitions >>> > > >> > > > > > > > > >> > > that >>> > > >> > > > > > > > > >> > > > >> > > sum(ISR, >>> > > >> > > > > > > > > >> > > > >> > > >>>> ELR) >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> < >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> min >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > ISR. What do you >>> think? >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> How about: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> A. a metric for the >>> totoal >>> > > >> number >>> > > >> > > of >>> > > >> > > > > > > > > >> under-min-isr >>> > > >> > > > > > > > > >> > > > >> > partitions? >>> > > >> > > > > > > > > >> > > > >> > > We >>> > > >> > > > > > > > > >> > > > >> > > >>>> don't >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> have that in Apache >>> Kafka >>> > at >>> > > >> the >>> > > >> > > > moment. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> B. a metric for the >>> number >>> > of >>> > > >> > > unclean >>> > > >> > > > > > leader >>> > > >> > > > > > > > > >> > elections >>> > > >> > > > > > > > > >> > > we >>> > > >> > > > > > > > > >> > > > >> did >>> > > >> > > > > > > > > >> > > > >> > > (for >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> simplicity, it can >>> reset >>> > to 0 >>> > > >> on >>> > > >> > > > > > controller >>> > > >> > > > > > > > > >> restart: >>> > > >> > > > > > > > > >> > we >>> > > >> > > > > > > > > >> > > > >> expect >>> > > >> > > > > > > > > >> > > > >> > > >>>> people to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> monitor the change over >>> > time >>> > > >> > anyway) >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> best, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> Colin >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > Yeah, for the ongoing >>> > > unclean >>> > > >> > > > > > recoveries, >>> > > >> > > > > > > > the >>> > > >> > > > > > > > > >> > > > controller >>> > > >> > > > > > > > > >> > > > >> can >>> > > >> > > > > > > > > >> > > > >> > > >>>> keep an >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > accurate count >>> through >>> > > >> failover >>> > > >> > > > > because >>> > > >> > > > > > > > > >> partition >>> > > >> > > > > > > > > >> > > > >> > registration >>> > > >> > > > > > > > > >> > > > >> > > >>>> can >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> indicate >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > whether a recovery is >>> > > needed. >>> > > >> > > > However, >>> > > >> > > > > > for >>> > > >> > > > > > > > the >>> > > >> > > > > > > > > >> > > happened >>> > > >> > > > > > > > > >> > > > >> > ones, >>> > > >> > > > > > > > > >> > > > >> > > >>>> unless >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> we >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > want to persist the >>> > number >>> > > >> > > > somewhere, >>> > > >> > > > > we >>> > > >> > > > > > > can >>> > > >> > > > > > > > > >> only >>> > > >> > > > > > > > > >> > > > figure >>> > > >> > > > > > > > > >> > > > >> it >>> > > >> > > > > > > > > >> > > > >> > > out >>> > > >> > > > > > > > > >> > > > >> > > >>>> from >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > log. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > On Tue, Sep 12, 2023 >>> at >>> > > >> 3:16 PM >>> > > >> > > > Colin >>> > > >> > > > > > > > McCabe < >>> > > >> > > > > > > > > >> > > > >> > > cmcc...@apache.org >>> > > >> > > > > > > > > >> > > > >> > > >>>> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> Also, we should have >>> > > metrics >>> > > >> > that >>> > > >> > > > > show >>> > > >> > > > > > > what >>> > > >> > > > > > > > > is >>> > > >> > > > > > > > > >> > going >>> > > >> > > > > > > > > >> > > > on >>> > > >> > > > > > > > > >> > > > >> > with >>> > > >> > > > > > > > > >> > > > >> > > >>>> regard >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> to >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> eligible replica >>> set. >>> > I'm >>> > > >> not >>> > > >> > > sure >>> > > >> > > > > > > exactly >>> > > >> > > > > > > > > >> what to >>> > > >> > > > > > > > > >> > > > >> suggest, >>> > > >> > > > > > > > > >> > > > >> > > but >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> something >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> that could identify >>> when >>> > > >> things >>> > > >> > > are >>> > > >> > > > > > going >>> > > >> > > > > > > > > >> wrong in >>> > > >> > > > > > > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > > clsuter. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> For example, maybe a >>> > > metric >>> > > >> for >>> > > >> > > > > > > partitions >>> > > >> > > > > > > > > >> > > containing >>> > > >> > > > > > > > > >> > > > >> > > replicas >>> > > >> > > > > > > > > >> > > > >> > > >>>> that >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> are >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> ineligible to be >>> leader? >>> > > >> That >>> > > >> > > would >>> > > >> > > > > > show >>> > > >> > > > > > > a >>> > > >> > > > > > > > > >> spike >>> > > >> > > > > > > > > >> > > when >>> > > >> > > > > > > > > >> > > > a >>> > > >> > > > > > > > > >> > > > >> > > broker >>> > > >> > > > > > > > > >> > > > >> > > >>>> had an >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> unclean restart. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> Ideally, we'd also >>> have >>> > a >>> > > >> > metric >>> > > >> > > > that >>> > > >> > > > > > > > > indicates >>> > > >> > > > > > > > > >> > when >>> > > >> > > > > > > > > >> > > > an >>> > > >> > > > > > > > > >> > > > >> > > unclear >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> leader >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> election or a >>> recovery >>> > > >> > happened. >>> > > >> > > > > It's a >>> > > >> > > > > > > bit >>> > > >> > > > > > > > > >> tricky >>> > > >> > > > > > > > > >> > > > >> because >>> > > >> > > > > > > > > >> > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> simple >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> thing, of tracking >>> it >>> > per >>> > > >> > > > controller, >>> > > >> > > > > > may >>> > > >> > > > > > > > be >>> > > >> > > > > > > > > a >>> > > >> > > > > > > > > >> bit >>> > > >> > > > > > > > > >> > > > >> > confusing >>> > > >> > > > > > > > > >> > > > >> > > >>>> during >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> failovers. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> best, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> Colin >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> On Tue, Sep 12, >>> 2023, at >>> > > >> 14:25, >>> > > >> > > > Colin >>> > > >> > > > > > > > McCabe >>> > > >> > > > > > > > > >> > wrote: >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > Hi Calvin, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > Thanks for the >>> KIP. I >>> > > >> think >>> > > >> > > this >>> > > >> > > > > is a >>> > > >> > > > > > > > great >>> > > >> > > > > > > > > >> > > > >> improvement. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >> Additional High >>> > > Watermark >>> > > >> > > > advance >>> > > >> > > > > > > > > >> requirement >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > Typo: change >>> "advance" >>> > > to >>> > > >> > > > > > "advancement" >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >> A bit recap of >>> some >>> > key >>> > > >> > > > concepts. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > Typo: change >>> "bit" to >>> > > >> "quick" >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >> Ack=1/all produce >>> > > >> request. >>> > > >> > It >>> > > >> > > > > > defines >>> > > >> > > > > > > > when >>> > > >> > > > > > > > > >> the >>> > > >> > > > > > > > > >> > > > Kafka >>> > > >> > > > > > > > > >> > > > >> > > server >>> > > >> > > > > > > > > >> > > > >> > > >>>> should >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> respond to the >>> produce >>> > > >> request >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > I think this >>> section >>> > > >> would be >>> > > >> > > > > clearer >>> > > >> > > > > > > if >>> > > >> > > > > > > > we >>> > > >> > > > > > > > > >> > talked >>> > > >> > > > > > > > > >> > > > >> about >>> > > >> > > > > > > > > >> > > > >> > > the >>> > > >> > > > > > > > > >> > > > >> > > >>>> new >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> high >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > watermark >>> advancement >>> > > >> > > requirement >>> > > >> > > > > > > first, >>> > > >> > > > > > > > > and >>> > > >> > > > > > > > > >> > THEN >>> > > >> > > > > > > > > >> > > > >> talked >>> > > >> > > > > > > > > >> > > > >> > > >>>> about its >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > impact on acks=0, >>> > > acks=1, >>> > > >> and >>> > > >> > > > > > > > acks=all. >>> > > >> > > > > > > > > >> > > > acks=all >>> > > >> > > > > > > > > >> > > > >> is >>> > > >> > > > > > > > > >> > > > >> > of >>> > > >> > > > > > > > > >> > > > >> > > >>>> course >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> the >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > main case we care >>> > about >>> > > >> here, >>> > > >> > > so >>> > > >> > > > it >>> > > >> > > > > > > would >>> > > >> > > > > > > > > be >>> > > >> > > > > > > > > >> > good >>> > > >> > > > > > > > > >> > > to >>> > > >> > > > > > > > > >> > > > >> lead >>> > > >> > > > > > > > > >> > > > >> > > with >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> that, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > rather than >>> delving >>> > into >>> > > >> the >>> > > >> > > > > > > > technicalities >>> > > >> > > > > > > > > >> of >>> > > >> > > > > > > > > >> > > > acks=0/1 >>> > > >> > > > > > > > > >> > > > >> > > first. >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> >> Unclean recovery >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > So, here you are >>> > > >> introducing >>> > > >> > a >>> > > >> > > > new >>> > > >> > > > > > > > > >> > configuration, >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > >>> > > unclean.recovery.strategy. >>> > > >> > The >>> > > >> > > > > > > difficult >>> > > >> > > > > > > > > >> thing >>> > > >> > > > > > > > > >> > > here >>> > > >> > > > > > > > > >> > > > is >>> > > >> > > > > > > > > >> > > > >> > that >>> > > >> > > > > > > > > >> > > > >> > > >>>> there >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> is a >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > lot of overlap >>> with >>> > > >> > > > > > > > > >> > > unclean.leader.election.enable. >>> > > >> > > > > > > > > >> > > > So >>> > > >> > > > > > > > > >> > > > >> we >>> > > >> > > > > > > > > >> > > > >> > > >>>> have 3 >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > different >>> settings for >>> > > >> > > > > > > > > >> > unclean.recovery.strategy, >>> > > >> > > > > > > > > >> > > > plus >>> > > >> > > > > > > > > >> > > > >> 2 >>> > > >> > > > > > > > > >> > > > >> > > >>>> different >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > settings for >>> > > >> > > > > > > > > unclean.leader.election.enable, >>> > > >> > > > > > > > > >> > > giving >>> > > >> > > > > > > > > >> > > > a >>> > > >> > > > > > > > > >> > > > >> > cross >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> product of >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > 6 different >>> options. >>> > The >>> > > >> > > > following >>> > > >> > > > > > > > "unclean >>> > > >> > > > > > > > > >> > > recovery >>> > > >> > > > > > > > > >> > > > >> > > manager" >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> section >>> > > >> > > > > > > > > >> > > > >> > > >>>> >> >> >> > on >>> > > >> > > > > > >>> > > >> > > > > >>> > > >> > > > >>> > > >> > > >>> > > >> > >>> > > >> >>> > > > >>> > > >>> > >>> >>> >>> -- >>> David Arthur >>> >> -- -David