Sorry -- not MV but software version. On Wed, Oct 4, 2023 at 9:51 AM Justine Olshan <jols...@confluent.io> wrote:
> Catching up with this discussion. > > I was just curious -- have we had other instances where downgrading MV is > not supported? I think Kafka typically tries to support downgrades, and I > couldn't think of other examples. > > Thanks, > Justine > > On Wed, Oct 4, 2023 at 9:40 AM Calvin Liu <ca...@confluent.io.invalid> > wrote: > >> Hi Jun, >> 54. Marked the software downgrading is not supported. As the old >> controller >> will not understand the new PartitionRecord and PartitionChangeRecord. >> Thanks! >> >> On Wed, Oct 4, 2023 at 9:12 AM Jun Rao <j...@confluent.io.invalid> wrote: >> >> > Hi, Calvin, >> > >> > Thanks for the reply. Just one more comment. >> > >> > 54. It seems that downgrading MV is supported. Is downgrading the >> software >> > version supported? It would be useful to document that. >> > >> > Thanks, >> > >> > Jun >> > >> > On Tue, Oct 3, 2023 at 4:55 PM Artem Livshits >> > <alivsh...@confluent.io.invalid> wrote: >> > >> > > Hi Colin, >> > > >> > > I think in your example "do_unclean_recovery" would need to do >> different >> > > things depending on the strategy. >> > > >> > > do_unclean_recovery() { >> > > if (unclean.recovery.manager.enabled) { >> > > if (strategy == Aggressive) >> > > use UncleanRecoveryManager(waitLastKnownERL=false) // just >> inspect >> > > logs from whoever is available >> > > else >> > > use UncleanRecoveryManager(waitLastKnownERL=true) // must wait >> > for >> > > at least last known ELR >> > > } else { >> > > if (strategy == Aggressive) >> > > choose the last known leader if that is available, or a random >> > leader >> > > if not) >> > > else >> > > wait for last known leader to get back >> > > } >> > > } >> > > >> > > The idea is that the Aggressive strategy would kick in as soon as we >> lost >> > > the leader and would pick a leader from whoever is available; but the >> > > Balanced will only kick in when ELR is empty and will wait for the >> > brokers >> > > that likely have most data to be available. >> > > >> > > On Tue, Oct 3, 2023 at 3:04 PM Colin McCabe <cmcc...@apache.org> >> wrote: >> > > >> > > > On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote: >> > > > > Hi, Calvin, >> > > > > >> > > > > Thanks for the update KIP. A few more comments. >> > > > > >> > > > > 41. Why would a user choose the option to select a random replica >> as >> > > the >> > > > > leader instead of using unclean.recovery.strateg=Aggressive? It >> seems >> > > > that >> > > > > the latter is strictly better? If that's not the case, could we >> fold >> > > this >> > > > > option under unclean.recovery.strategy instead of introducing a >> > > separate >> > > > > config? >> > > > >> > > > Hi Jun, >> > > > >> > > > I thought the flow of control was: >> > > > >> > > > If there is no leader for the partition { >> > > > If (there are unfenced ELR members) { >> > > > choose_an_unfenced_ELR_member >> > > > } else if (there are fenced ELR members AND strategy=Aggressive) { >> > > > do_unclean_recovery >> > > > } else if (there are no ELR members AND strategy != None) { >> > > > do_unclean_recovery >> > > > } else { >> > > > do nothing about the missing leader >> > > > } >> > > > } >> > > > >> > > > do_unclean_recovery() { >> > > > if (unclean.recovery.manager.enabled) { >> > > > use UncleanRecoveryManager >> > > > } else { >> > > > choose the last known leader if that is available, or a random >> > leader >> > > > if not) >> > > > } >> > > > } >> > > > >> > > > However, I think this could be clarified, especially the behavior >> when >> > > > unclean.recovery.manager.enabled=false. Inuitively the goal for >> > > > unclean.recovery.manager.enabled=false is to be "the same as now, >> > mostly" >> > > > but it's very underspecified in the KIP, I agree. >> > > > >> > > > > >> > > > > 50. ElectLeadersRequest: "If more than 20 topics are included, >> only >> > the >> > > > > first 20 will be served. Others will be returned with >> > DesiredLeaders." >> > > > Hmm, >> > > > > not sure that I understand this. ElectLeadersResponse doesn't >> have a >> > > > > DesiredLeaders field. >> > > > > >> > > > > 51. GetReplicaLogInfo: "If more than 2000 partitions are included, >> > only >> > > > the >> > > > > first 2000 will be served" Do we return an error for the remaining >> > > > > partitions? Actually, should we include an errorCode field at the >> > > > partition >> > > > > level in GetReplicaLogInfoResponse to cover non-existing >> partitions >> > and >> > > > no >> > > > > authorization, etc? >> > > > > >> > > > > 52. The entry should matches => The entry should match >> > > > > >> > > > > 53. ElectLeadersRequest.DesiredLeaders: Should it be nullable >> since a >> > > > user >> > > > > may not specify DesiredLeaders? >> > > > > >> > > > > 54. Downgrade: Is that indeed possible? I thought earlier you said >> > that >> > > > > once the new version of the records are in the metadata log, one >> > can't >> > > > > downgrade since the old broker doesn't know how to parse the new >> > > version >> > > > of >> > > > > the metadata records? >> > > > > >> > > > >> > > > MetadataVersion downgrade is currently broken but we have fixing it >> on >> > > our >> > > > plate for Kafka 3.7. >> > > > >> > > > The way downgrade works is that "new features" are dropped, leaving >> > only >> > > > the old ones. >> > > > >> > > > > 55. CleanShutdownFile: Should we add a version field for future >> > > > extension? >> > > > > >> > > > > 56. Config changes are public facing. Could we have a separate >> > section >> > > to >> > > > > document all the config changes? >> > > > >> > > > +1. A separate section for this would be good. >> > > > >> > > > best, >> > > > Colin >> > > > >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Jun >> > > > > >> > > > > On Mon, Sep 25, 2023 at 4:29 PM Calvin Liu >> > <ca...@confluent.io.invalid >> > > > >> > > > > wrote: >> > > > > >> > > > >> Hi Jun >> > > > >> Thanks for the comments. >> > > > >> >> > > > >> 40. If we change to None, it is not guaranteed for no data loss. >> For >> > > > users >> > > > >> who are not able to validate the data with external resources, >> > manual >> > > > >> intervention does not give a better result but a loss of >> > availability. >> > > > So >> > > > >> practically speaking, the Balance mode would be a better default >> > > value. >> > > > >> >> > > > >> 41. No, it represents how we want to do the unclean leader >> election. >> > > If >> > > > it >> > > > >> is false, the unclean leader election will be the old random way. >> > > > >> Otherwise, the unclean recovery will be used. >> > > > >> >> > > > >> 42. Good catch. Updated. >> > > > >> >> > > > >> 43. Only the first 20 topics will be served. Others will be >> returned >> > > > with >> > > > >> InvalidRequestError >> > > > >> >> > > > >> 44. The order matters. The desired leader entries match with the >> > topic >> > > > >> partition list by the index. >> > > > >> >> > > > >> 45. Thanks! Updated. >> > > > >> >> > > > >> 46. Good advice! Updated. >> > > > >> >> > > > >> 47.1, updated the comment. Basically it will elect the replica in >> > the >> > > > >> desiredLeader field to be the leader >> > > > >> >> > > > >> 47.2 We can let the admin client do the conversion. Using the >> > > > desiredLeader >> > > > >> field in the json format seems easier for users. >> > > > >> >> > > > >> 48. Once the MV version is downgraded, all the ELR related fields >> > will >> > > > be >> > > > >> removed on the next partition change. The controller will also >> > ignore >> > > > the >> > > > >> ELR fields. Updated the KIP. >> > > > >> >> > > > >> 49. Yes, it would be deprecated/removed. >> > > > >> >> > > > >> >> > > > >> On Mon, Sep 25, 2023 at 3:49 PM Jun Rao <j...@confluent.io.invalid >> > >> > > > wrote: >> > > > >> >> > > > >> > Hi, Calvin, >> > > > >> > >> > > > >> > Thanks for the updated KIP. Made another pass. A few more >> comments >> > > > below. >> > > > >> > >> > > > >> > 40. unclean.leader.election.enable.false -> >> > > > >> > unclean.recovery.strategy.Balanced: The Balanced mode could >> still >> > > > lead to >> > > > >> > data loss. So, I am wondering if >> > > unclean.leader.election.enable.false >> > > > >> > should map to None? >> > > > >> > >> > > > >> > 41. unclean.recovery.manager.enabled: I am not sure why we >> > introduce >> > > > this >> > > > >> > additional config. Is it the same as >> > unclean.recovery.strategy=None? >> > > > >> > >> > > > >> > 42. DescribeTopicResponse.TopicAuthorizedOperations: Should >> this >> > be >> > > at >> > > > >> the >> > > > >> > topic level? >> > > > >> > >> > > > >> > 43. "Limit: 20 topics max per request": Could we describe what >> > > > happens if >> > > > >> > the request includes more than 20 topics? >> > > > >> > >> > > > >> > 44. ElectLeadersRequest.DesiredLeaders: Could we describe >> whether >> > > the >> > > > >> > ordering matters? >> > > > >> > >> > > > >> > 45. GetReplicaLogInfo.TopicPartitions: "about": "The topic >> > > partitions >> > > > to >> > > > >> > elect leaders.": The description in "about" is incorrect. >> > > > >> > >> > > > >> > 46. GetReplicaLogInfoResponse: Should we nest partitions under >> > > > topicId to >> > > > >> > be consistent with other types of responses? >> > > > >> > >> > > > >> > 47. kafka-leader-election.sh: >> > > > >> > 47.1 Could we explain DESIGNATION? >> > > > >> > 47.2 desiredLeader: Should it be a list to match the field in >> > > > >> > ElectLeadersRequest? >> > > > >> > >> > > > >> > 48. We could add a section on downgrade? >> > > > >> > >> > > > >> > 49. LastKnownLeader: This seems only needed in the first phase >> of >> > > > >> > delivering ELR. Will it be removed when the complete KIP is >> > > delivered? >> > > > >> > >> > > > >> > Thanks, >> > > > >> > >> > > > >> > Jun >> > > > >> > >> > > > >> > On Tue, Sep 19, 2023 at 1:30 PM Colin McCabe < >> cmcc...@apache.org> >> > > > wrote: >> > > > >> > >> > > > >> > > Hi Calvin, >> > > > >> > > >> > > > >> > > Thanks for the explanations. I like the idea of using none, >> > > > balanced, >> > > > >> > > aggressive. We also had an offline discussion about why it is >> > good >> > > > to >> > > > >> > use a >> > > > >> > > new config key (basically, so that we can deprecate the old >> one >> > > > which >> > > > >> had >> > > > >> > > only false/true values in 4.0) With these changes, I am +1. >> > > > >> > > >> > > > >> > > best, >> > > > >> > > Colin >> > > > >> > > >> > > > >> > > On Mon, Sep 18, 2023, at 15:54, Calvin Liu wrote: >> > > > >> > > > Hi Colin, >> > > > >> > > > Also, can we deprecate unclean.leader.election.enable in >> 4.0? >> > > > Before >> > > > >> > > that, >> > > > >> > > > we can have both the config unclean.recovery.strategy and >> > > > >> > > > unclean.leader.election.enable >> > > > >> > > > and using the unclean.recovery.Enabled to determine which >> > config >> > > > to >> > > > >> use >> > > > >> > > > during the unclean leader election. >> > > > >> > > > >> > > > >> > > > On Mon, Sep 18, 2023 at 3:51 PM Calvin Liu < >> > ca...@confluent.io> >> > > > >> wrote: >> > > > >> > > > >> > > > >> > > >> Hi Colin, >> > > > >> > > >> For the unclean.recovery.strategy config name, how about >> we >> > use >> > > > the >> > > > >> > > >> following >> > > > >> > > >> None. It basically means no unclean recovery will be >> > performed. >> > > > >> > > >> Aggressive. It means availability goes first. Whenever the >> > > > partition >> > > > >> > > can't >> > > > >> > > >> elect a durable replica, the controller will try the >> unclean >> > > > >> recovery. >> > > > >> > > >> Balanced. It is the balance point of the availability >> > > > >> > first(Aggressive) >> > > > >> > > >> and least availability(None). The controller performs >> unclean >> > > > >> recovery >> > > > >> > > when >> > > > >> > > >> both ISR and ELR are empty. >> > > > >> > > >> >> > > > >> > > >> >> > > > >> > > >> On Fri, Sep 15, 2023 at 11:42 AM Calvin Liu < >> > > ca...@confluent.io> >> > > > >> > wrote: >> > > > >> > > >> >> > > > >> > > >>> Hi Colin, >> > > > >> > > >>> >> > > > >> > > >>> > So, the proposal is that if someone sets >> > > > >> > > "unclean.leader.election.enable >> > > > >> > > >>> = true"... >> > > > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> The idea is to use one of the >> unclean.leader.election.enable >> > > and >> > > > >> > > >>> unclean.recovery.strategy based on the >> > > > unclean.recovery.Enabled. A >> > > > >> > > possible >> > > > >> > > >>> version can be >> > > > >> > > >>> >> > > > >> > > >>> If unclean.recovery.Enabled: >> > > > >> > > >>> >> > > > >> > > >>> { >> > > > >> > > >>> >> > > > >> > > >>> Check unclean.recovery.strategy. If set, use it. >> Otherwise, >> > > > check >> > > > >> > > >>> unclean.leader.election.enable and translate it to >> > > > >> > > >>> unclean.recovery.strategy. >> > > > >> > > >>> >> > > > >> > > >>> } else { >> > > > >> > > >>> >> > > > >> > > >>> Use unclean.leader.election.enable >> > > > >> > > >>> >> > > > >> > > >>> } >> > > > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> —-------- >> > > > >> > > >>> >> > > > >> > > >>> >The configuration key should be >> > > > >> "unclean.recovery.manager.enabled", >> > > > >> > > >>> right? >> > > > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> I think we have two ways of choosing a leader uncleanly, >> > > unclean >> > > > >> > leader >> > > > >> > > >>> election and unclean recovery(log inspection) and we try >> to >> > > > switch >> > > > >> > > between >> > > > >> > > >>> them. >> > > > >> > > >>> >> > > > >> > > >>> Do you mean we want to develop two ways of performing the >> > > > unclean >> > > > >> > > >>> recovery and one of them is using “unclean recovery >> > manager”? >> > > I >> > > > >> guess >> > > > >> > > we >> > > > >> > > >>> haven’t discussed the second way. >> > > > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> —------- >> > > > >> > > >>> >> > > > >> > > >>> >How do these 4 levels of overrides interact with your >> new >> > > > >> > > >>> configurations? >> > > > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> I do notice in the Kraft controller code, the method to >> > check >> > > > >> whether >> > > > >> > > >>> perform unclean leader election is hard coded to false >> since >> > > > >> > > >>> 2021(uncleanLeaderElectionEnabledForTopic). Isn’t it a >> good >> > > > chance >> > > > >> to >> > > > >> > > >>> completely deprecate the unclean.leader.election.enable? >> We >> > > > don’t >> > > > >> > even >> > > > >> > > have >> > > > >> > > >>> to worry about the config conversion. >> > > > >> > > >>> >> > > > >> > > >>> On the other hand, whatever the override is, as long as >> the >> > > > >> > controller >> > > > >> > > >>> can have the final effective >> unclean.leader.election.enable, >> > > the >> > > > >> > topic >> > > > >> > > >>> level config unclean.recovery.strategy, the cluster level >> > > config >> > > > >> > > >>> unclean.recovery.Enabled, the controller can calculate >> the >> > > > correct >> > > > >> > > methods >> > > > >> > > >>> to use right? >> > > > >> > > >>> >> > > > >> > > >>> >> > > > >> > > >>> On Fri, Sep 15, 2023 at 10:02 AM Colin McCabe < >> > > > cmcc...@apache.org> >> > > > >> > > wrote: >> > > > >> > > >>> >> > > > >> > > >>>> On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote: >> > > > >> > > >>>> > Hi Colin >> > > > >> > > >>>> > 1. I think using the new config name is more clear. >> > > > >> > > >>>> > a. The unclean leader election is actually >> removed >> > > if >> > > > >> > unclean >> > > > >> > > >>>> > recovery is in use. >> > > > >> > > >>>> > b. Using multiple values in >> > > > >> unclean.leader.election.enable >> > > > >> > is >> > > > >> > > >>>> > confusing and it will be more confusing after people >> > forget >> > > > >> about >> > > > >> > > this >> > > > >> > > >>>> > discussion. >> > > > >> > > >>>> >> > > > >> > > >>>> Hi Calvin, >> > > > >> > > >>>> >> > > > >> > > >>>> So, the proposal is that if someone sets >> > > > >> > > "unclean.leader.election.enable >> > > > >> > > >>>> = true" but then sets one of your new configurations, >> the >> > > > value of >> > > > >> > > >>>> unclean.leader.election.enable is ignored? That seems >> less >> > > > clear >> > > > >> to >> > > > >> > > me, not >> > > > >> > > >>>> more. Just in general, having multiple configuration >> keys >> > to >> > > > >> control >> > > > >> > > the >> > > > >> > > >>>> same thing confuses users. Basically, they are sitting >> at a >> > > > giant >> > > > >> > > control >> > > > >> > > >>>> panel, and some of the levers do nothing. >> > > > >> > > >>>> >> > > > >> > > >>>> > 2. Sorry I forgot to mention in the response that I >> did >> > add >> > > > the >> > > > >> > > >>>> > unclean.recovery.Enabled flag. >> > > > >> > > >>>> >> > > > >> > > >>>> The configuration key should be >> > > > >> "unclean.recovery.manager.enabled", >> > > > >> > > >>>> right? Becuase we can do "unclean recovery" without the >> > > > manager. >> > > > >> > > Disabling >> > > > >> > > >>>> the manager just means we use a different mechanism for >> > > > recovery. >> > > > >> > > >>>> >> > > > >> > > >>>> > c. Maybe I underestimated the challenge of >> > replacing >> > > > the >> > > > >> > > >>>> config. Any >> > > > >> > > >>>> > implementation problems ahead? >> > > > >> > > >>>> >> > > > >> > > >>>> There are four levels of overrides for >> > > > >> > unclean.leader.election.enable. >> > > > >> > > >>>> >> > > > >> > > >>>> 1. static configuration for node. >> > > > >> > > >>>> This goes in the configuration file, typically named >> > > > >> > > >>>> server.properties >> > > > >> > > >>>> >> > > > >> > > >>>> 2. dynamic configuration for node default >> > > > >> > > >>>> ConfigResource(type=BROKER, name="") >> > > > >> > > >>>> >> > > > >> > > >>>> 3. dynamic configuration for node >> > > > >> > > >>>> ConfigResource(type=BROKER, name=<controller id>) >> > > > >> > > >>>> >> > > > >> > > >>>> 4. dynamic configuration for topic >> > > > >> > > >>>> ConfigResource(type=TOPIC, name=<topic-name>) >> > > > >> > > >>>> >> > > > >> > > >>>> How do these 4 levels of overrides interact with your >> new >> > > > >> > > >>>> configurations? If the new configurations dominate over >> the >> > > old >> > > > >> > ones, >> > > > >> > > it >> > > > >> > > >>>> seems like this will get a lot more confusing to >> implement >> > > (and >> > > > >> also >> > > > >> > > to >> > > > >> > > >>>> use.) >> > > > >> > > >>>> >> > > > >> > > >>>> Again, I'd recommend just adding some new values to >> > > > >> > > >>>> unclean.leader.election.enable. It's simple and will >> > prevent >> > > > user >> > > > >> > > confusion >> > > > >> > > >>>> (as well as developer confusion.) >> > > > >> > > >>>> >> > > > >> > > >>>> best, >> > > > >> > > >>>> Colin >> > > > >> > > >>>> >> > > > >> > > >>>> >> > > > >> > > >>>> > 3. About the admin client, I mentioned 3 changes in >> the >> > > > client. >> > > > >> > > >>>> Anything >> > > > >> > > >>>> > else I missed in the KIP? >> > > > >> > > >>>> > a. The client will switch to using the new RPC >> > > instead >> > > > of >> > > > >> > > >>>> > MetadataRequest for the topics. >> > > > >> > > >>>> > b. The TopicPartitionInfo used in >> TopicDescription >> > > > needs >> > > > >> to >> > > > >> > > add >> > > > >> > > >>>> new >> > > > >> > > >>>> > fields related to the ELR. >> > > > >> > > >>>> > c. The outputs will add the ELR related fields. >> > > > >> > > >>>> > >> > > > >> > > >>>> > On Thu, Sep 14, 2023 at 9:19 PM Colin McCabe < >> > > > >> cmcc...@apache.org> >> > > > >> > > >>>> wrote: >> > > > >> > > >>>> > >> > > > >> > > >>>> >> Hi Calvin, >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> Thanks for the changes. >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> 1. Earlier I commented that creating >> > > > >> "unclean.recovery.strategy " >> > > > >> > > is >> > > > >> > > >>>> not >> > > > >> > > >>>> >> necessary, and we can just reuse the existing >> > > > >> > > >>>> >> "unclean.leader.election.enable" configuration key. >> > Let's >> > > > >> discuss >> > > > >> > > >>>> that. >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> 2.I also don't understand why you didn't add a >> > > > configuration to >> > > > >> > > >>>> enable or >> > > > >> > > >>>> >> disable the Unclean Recovery Manager. This seems >> like a >> > > very >> > > > >> > simple >> > > > >> > > >>>> way to >> > > > >> > > >>>> >> handle the staging issue which we discussed. The URM >> can >> > > > just >> > > > >> be >> > > > >> > > >>>> turned off >> > > > >> > > >>>> >> until it is production ready. Let's discuss this. >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> 3. You still need to describe the changes to >> AdminClient >> > > > that >> > > > >> are >> > > > >> > > >>>> needed >> > > > >> > > >>>> >> to use DescribeTopicRequest. >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> Keep at it. It's looking better. :) >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> best, >> > > > >> > > >>>> >> Colin >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> On Thu, Sep 14, 2023, at 11:03, Calvin Liu wrote: >> > > > >> > > >>>> >> > Hi Colin >> > > > >> > > >>>> >> > Thanks for the comments! >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > I did the following changes >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > 1. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > Simplified the API spec section to only include >> the >> > > > diff. >> > > > >> > > >>>> >> > 2. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > Reordered the HWM requirement section. >> > > > >> > > >>>> >> > 3. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > Removed the URM implementation details to keep >> the >> > > > >> necessary >> > > > >> > > >>>> >> > characteristics to perform the unclean recovery. >> > > > >> > > >>>> >> > 1. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > When to perform the unclean recovery >> > > > >> > > >>>> >> > 2. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > Under different config, how the unclean >> recovery >> > > > finds >> > > > >> > the >> > > > >> > > >>>> leader. >> > > > >> > > >>>> >> > 3. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > How the config unclean.leader.election.enable >> > and >> > > > >> > > >>>> >> > unclean.recovery.strategy are converted when >> > users >> > > > >> > > >>>> enable/disable >> > > > >> > > >>>> >> the >> > > > >> > > >>>> >> > unclean recovery. >> > > > >> > > >>>> >> > 4. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > More details about how we change admin client. >> > > > >> > > >>>> >> > 5. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > API limits on the GetReplicaLogInfoRequest and >> > > > >> > > >>>> DescribeTopicRequest. >> > > > >> > > >>>> >> > 6. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > Two metrics added >> > > > >> > > >>>> >> > 1. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > >> > > > Kafka.controller.global_under_min_isr_partition_count >> > > > >> > > >>>> >> > 2. >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > >> kafka.controller.unclean_recovery_finished_count >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> > On Wed, Sep 13, 2023 at 10:46 AM Colin McCabe < >> > > > >> > > cmcc...@apache.org> >> > > > >> > > >>>> >> wrote: >> > > > >> > > >>>> >> > >> > > > >> > > >>>> >> >> On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote: >> > > > >> > > >>>> >> >> > Hi Colin >> > > > >> > > >>>> >> >> > Thanks for the comments! >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> Hi Calvin, >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> Thanks again for the KIP. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> One meta-comment: it's usually better to just do a >> > diff >> > > > on a >> > > > >> > > >>>> message >> > > > >> > > >>>> >> spec >> > > > >> > > >>>> >> >> file or java file if you're including changes to >> it >> > in >> > > > the >> > > > >> > KIP. >> > > > >> > > >>>> This is >> > > > >> > > >>>> >> >> easier to read than looking for "new fields begin" >> > etc. >> > > > in >> > > > >> the >> > > > >> > > >>>> text, and >> > > > >> > > >>>> >> >> gracefully handles the case where existing fields >> > were >> > > > >> > changed. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > Rewrite the Additional High Watermark >> advancement >> > > > >> > requirement >> > > > >> > > >>>> >> >> > There was feedback on this section that some >> > readers >> > > > may >> > > > >> not >> > > > >> > > be >> > > > >> > > >>>> >> familiar >> > > > >> > > >>>> >> >> > with HWM and Ack=0,1,all requests. This can help >> > them >> > > > >> > > understand >> > > > >> > > >>>> the >> > > > >> > > >>>> >> >> > proposal. I will rewrite this part for more >> > > > readability. >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> To be clear, I wasn't suggesting dropping either >> > > > section. I >> > > > >> > > agree >> > > > >> > > >>>> that >> > > > >> > > >>>> >> >> they add useful background. I was just suggesting >> > that >> > > we >> > > > >> > should >> > > > >> > > >>>> discuss >> > > > >> > > >>>> >> >> the "acks" setting AFTER discussing the new high >> > > > watermark >> > > > >> > > >>>> advancement >> > > > >> > > >>>> >> >> conditions. We also should discuss acks=0. While >> it >> > > isn't >> > > > >> > > >>>> conceptually >> > > > >> > > >>>> >> much >> > > > >> > > >>>> >> >> different than acks=1 here, its omission from this >> > > > section >> > > > >> is >> > > > >> > > >>>> confusing. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > Unclean recovery >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > The plan is to replace the >> > > > unclean.leader.election.enable >> > > > >> > with >> > > > >> > > >>>> >> >> > unclean.recovery.strategy. If the Unclean >> Recovery >> > is >> > > > >> > enabled >> > > > >> > > >>>> then it >> > > > >> > > >>>> >> >> deals >> > > > >> > > >>>> >> >> > with the three options in the >> > > > unclean.recovery.strategy. >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > Let’s refine the Unclean Recovery. We have >> already >> > > > taken a >> > > > >> > > lot of >> > > > >> > > >>>> >> >> > suggestions and I hope to enhance the >> durability of >> > > > Kafka >> > > > >> to >> > > > >> > > the >> > > > >> > > >>>> next >> > > > >> > > >>>> >> >> level >> > > > >> > > >>>> >> >> > with this KIP. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> I am OK with doing the unclean leader recovery >> > > > improvements >> > > > >> in >> > > > >> > > >>>> this KIP. >> > > > >> > > >>>> >> >> However, I think we need to really work on the >> > > > configuration >> > > > >> > > >>>> settings. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> Configuration overrides are often quite messy. For >> > > > example, >> > > > >> > the >> > > > >> > > >>>> cases >> > > > >> > > >>>> >> >> where we have log.roll.hours and >> log.roll.segment.ms >> > , >> > > > the >> > > > >> > user >> > > > >> > > >>>> has to >> > > > >> > > >>>> >> >> remember which one takes precedence, and it is not >> > > > obvious. >> > > > >> > So, >> > > > >> > > >>>> rather >> > > > >> > > >>>> >> than >> > > > >> > > >>>> >> >> creating a new configuration, why not add >> additional >> > > > values >> > > > >> to >> > > > >> > > >>>> >> >> "unclean.leader.election.enable"? I think this >> will >> > be >> > > > >> simpler >> > > > >> > > for >> > > > >> > > >>>> >> people >> > > > >> > > >>>> >> >> to understand, and simpler in the code as well. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> What if we continued to use >> > > > "unclean.leader.election.enable" >> > > > >> > but >> > > > >> > > >>>> >> extended >> > > > >> > > >>>> >> >> it so that it took a string? Then the string could >> > have >> > > > >> these >> > > > >> > > >>>> values: >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> never >> > > > >> > > >>>> >> >> never automatically do an unclean leader >> election >> > > > under >> > > > >> > any >> > > > >> > > >>>> >> conditions >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> false / default >> > > > >> > > >>>> >> >> only do an unclean leader election if there >> may >> > be >> > > > >> > possible >> > > > >> > > >>>> data >> > > > >> > > >>>> >> loss >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> true / always >> > > > >> > > >>>> >> >> always do an unclean leader election if we >> can't >> > > > >> > immediately >> > > > >> > > >>>> elect a >> > > > >> > > >>>> >> >> leader >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> It's a bit awkward that false maps to default >> rather >> > > > than to >> > > > >> > > >>>> never. But >> > > > >> > > >>>> >> >> this awkwardness exists if we use two different >> > > > >> configuration >> > > > >> > > keys >> > > > >> > > >>>> as >> > > > >> > > >>>> >> well. >> > > > >> > > >>>> >> >> The reason for the awkwardness is that we simply >> > don't >> > > > want >> > > > >> > most >> > > > >> > > >>>> of the >> > > > >> > > >>>> >> >> people currently setting >> > > > >> unclean.leader.election.enable=false >> > > > >> > to >> > > > >> > > >>>> get the >> > > > >> > > >>>> >> >> "never" behavior. We have to bite that bullet. >> Better >> > > to >> > > > be >> > > > >> > > clear >> > > > >> > > >>>> and >> > > > >> > > >>>> >> >> explicit than hide it. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> Another thing that's a bit awkward is having two >> > > > different >> > > > >> > ways >> > > > >> > > to >> > > > >> > > >>>> do >> > > > >> > > >>>> >> >> unclean leader election specified in the KIP. You >> > > > descirbe >> > > > >> two >> > > > >> > > >>>> methods: >> > > > >> > > >>>> >> the >> > > > >> > > >>>> >> >> simple "choose the last leader" method, and the >> > > "unclean >> > > > >> > > recovery >> > > > >> > > >>>> >> manager" >> > > > >> > > >>>> >> >> method. I understand why you did it this way -- >> > "choose >> > > > the >> > > > >> > last >> > > > >> > > >>>> >> leader" is >> > > > >> > > >>>> >> >> simple, and will help us deliver an implementation >> > > > quickly, >> > > > >> > > while >> > > > >> > > >>>> the >> > > > >> > > >>>> >> URM >> > > > >> > > >>>> >> >> is preferable in the long term. My suggestion >> here is >> > > to >> > > > >> > > separate >> > > > >> > > >>>> the >> > > > >> > > >>>> >> >> decision of HOW to do unclean leader election from >> > the >> > > > >> > decision >> > > > >> > > of >> > > > >> > > >>>> WHEN >> > > > >> > > >>>> >> to >> > > > >> > > >>>> >> >> do it. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> So in other words, have >> > > "unclean.leader.election.enable" >> > > > >> > specify >> > > > >> > > >>>> when we >> > > > >> > > >>>> >> >> do unclean leader election, and have a new >> > > configuration >> > > > >> like >> > > > >> > > >>>> >> >> "unclean.recovery.manager.enable" to determine if >> we >> > > use >> > > > the >> > > > >> > > URM. >> > > > >> > > >>>> >> >> Presumably the URM will take some time to get >> fully >> > > > stable, >> > > > >> so >> > > > >> > > >>>> this can >> > > > >> > > >>>> >> >> default to false for a while, and we can flip the >> > > > default to >> > > > >> > > true >> > > > >> > > >>>> when >> > > > >> > > >>>> >> we >> > > > >> > > >>>> >> >> feel ready. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> The URM is somewhat under-described here. I think >> we >> > > > need a >> > > > >> > few >> > > > >> > > >>>> >> >> configurations here for it. For example, we need a >> > > > >> > > configuration to >> > > > >> > > >>>> >> specify >> > > > >> > > >>>> >> >> how long it should wait for a broker to respond to >> > its >> > > > RPCs >> > > > >> > > before >> > > > >> > > >>>> >> moving >> > > > >> > > >>>> >> >> on. We also need to understand how the URM >> interacts >> > > with >> > > > >> > > >>>> >> >> unclean.leader.election.enable=always. I assume >> that >> > > with >> > > > >> > > "always" >> > > > >> > > >>>> we >> > > > >> > > >>>> >> will >> > > > >> > > >>>> >> >> just unconditionally use the URM rather than >> choosing >> > > > >> > randomly. >> > > > >> > > >>>> But this >> > > > >> > > >>>> >> >> should be spelled out in the KIP. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > DescribeTopicRequest >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > 1. >> > > > >> > > >>>> >> >> > Yes, the plan is to replace the >> MetadataRequest >> > > with >> > > > >> the >> > > > >> > > >>>> >> >> > DescribeTopicRequest for the admin clients. >> Will >> > > > check >> > > > >> > the >> > > > >> > > >>>> details. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> Sounds good. But as I said, you need to specify >> how >> > > > >> > AdminClient >> > > > >> > > >>>> >> interacts >> > > > >> > > >>>> >> >> with the new request. This will involve adding >> some >> > > > fields >> > > > >> to >> > > > >> > > >>>> >> >> TopicDescription.java. And you need to specify the >> > > > changes >> > > > >> to >> > > > >> > > the >> > > > >> > > >>>> >> >> kafka-topics.sh command line tool. Otherwise we >> > cannot >> > > > use >> > > > >> the >> > > > >> > > >>>> tool to >> > > > >> > > >>>> >> see >> > > > >> > > >>>> >> >> the new information. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> The new requests, DescribeTopicRequest and >> > > > >> > > >>>> GetReplicaLogInfoRequest, >> > > > >> > > >>>> >> need >> > > > >> > > >>>> >> >> to have limits placed on them so that their size >> > can't >> > > be >> > > > >> > > >>>> infinite. We >> > > > >> > > >>>> >> >> don't want to propagate the current problems of >> > > > >> > MetadataRequest, >> > > > >> > > >>>> where >> > > > >> > > >>>> >> >> clients can request massive responses that can >> mess >> > up >> > > > the >> > > > >> JVM >> > > > >> > > when >> > > > >> > > >>>> >> handled. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> Adding limits is simple for >> GetReplicaLogInfoRequest >> > -- >> > > > we >> > > > >> can >> > > > >> > > >>>> just say >> > > > >> > > >>>> >> >> that only 2000 partitions at a time can be >> requested. >> > > For >> > > > >> > > >>>> >> >> DescribeTopicRequest we can probably just limit >> to 20 >> > > > topics >> > > > >> > or >> > > > >> > > >>>> >> something >> > > > >> > > >>>> >> >> like that, to avoid the complexity of doing >> > pagination >> > > in >> > > > >> this >> > > > >> > > KIP. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > 2. >> > > > >> > > >>>> >> >> > I can let the broker load the ELR info so >> that >> > > they >> > > > can >> > > > >> > > serve >> > > > >> > > >>>> the >> > > > >> > > >>>> >> >> > DescribeTopicRequest as well. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> Yes, it's fine to add to MetadataCache. In fact, >> > you'll >> > > > be >> > > > >> > > loading >> > > > >> > > >>>> it >> > > > >> > > >>>> >> >> anyway once it's added to PartitionImage. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > 3. >> > > > >> > > >>>> >> >> > Yeah, it does not make sense to have the >> topic >> > id >> > > if >> > > > >> > > >>>> >> >> > DescribeTopicRequest is only used by the >> admin >> > > > client. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> OK. That makes things simpler. We can always >> create a >> > > new >> > > > >> API >> > > > >> > > later >> > > > >> > > >>>> >> >> (hopefully not in this KIP!) to query by topic ID. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > Metrics >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > As for overall cluster health metrics, I think >> > > > >> under-min-ISR >> > > > >> > > is >> > > > >> > > >>>> still >> > > > >> > > >>>> >> a >> > > > >> > > >>>> >> >> > useful one. ELR is more like a safety belt. When >> > the >> > > > ELR >> > > > >> is >> > > > >> > > >>>> used, the >> > > > >> > > >>>> >> >> > cluster availability has already been impacted. >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > Maybe we can have a metric to count the >> partitions >> > > that >> > > > >> > > sum(ISR, >> > > > >> > > >>>> ELR) >> > > > >> > > >>>> >> < >> > > > >> > > >>>> >> >> min >> > > > >> > > >>>> >> >> > ISR. What do you think? >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> How about: >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> A. a metric for the totoal number of >> under-min-isr >> > > > >> > partitions? >> > > > >> > > We >> > > > >> > > >>>> don't >> > > > >> > > >>>> >> >> have that in Apache Kafka at the moment. >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> B. a metric for the number of unclean leader >> > elections >> > > we >> > > > >> did >> > > > >> > > (for >> > > > >> > > >>>> >> >> simplicity, it can reset to 0 on controller >> restart: >> > we >> > > > >> expect >> > > > >> > > >>>> people to >> > > > >> > > >>>> >> >> monitor the change over time anyway) >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> best, >> > > > >> > > >>>> >> >> Colin >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > Yeah, for the ongoing unclean recoveries, the >> > > > controller >> > > > >> can >> > > > >> > > >>>> keep an >> > > > >> > > >>>> >> >> > accurate count through failover because >> partition >> > > > >> > registration >> > > > >> > > >>>> can >> > > > >> > > >>>> >> >> indicate >> > > > >> > > >>>> >> >> > whether a recovery is needed. However, for the >> > > happened >> > > > >> > ones, >> > > > >> > > >>>> unless >> > > > >> > > >>>> >> we >> > > > >> > > >>>> >> >> > want to persist the number somewhere, we can >> only >> > > > figure >> > > > >> it >> > > > >> > > out >> > > > >> > > >>>> from >> > > > >> > > >>>> >> the >> > > > >> > > >>>> >> >> > log. >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> > On Tue, Sep 12, 2023 at 3:16 PM Colin McCabe < >> > > > >> > > cmcc...@apache.org >> > > > >> > > >>>> > >> > > > >> > > >>>> >> wrote: >> > > > >> > > >>>> >> >> > >> > > > >> > > >>>> >> >> >> Also, we should have metrics that show what is >> > going >> > > > on >> > > > >> > with >> > > > >> > > >>>> regard >> > > > >> > > >>>> >> to >> > > > >> > > >>>> >> >> the >> > > > >> > > >>>> >> >> >> eligible replica set. I'm not sure exactly >> what to >> > > > >> suggest, >> > > > >> > > but >> > > > >> > > >>>> >> >> something >> > > > >> > > >>>> >> >> >> that could identify when things are going >> wrong in >> > > the >> > > > >> > > clsuter. >> > > > >> > > >>>> >> >> >> >> > > > >> > > >>>> >> >> >> For example, maybe a metric for partitions >> > > containing >> > > > >> > > replicas >> > > > >> > > >>>> that >> > > > >> > > >>>> >> are >> > > > >> > > >>>> >> >> >> ineligible to be leader? That would show a >> spike >> > > when >> > > > a >> > > > >> > > broker >> > > > >> > > >>>> had an >> > > > >> > > >>>> >> >> >> unclean restart. >> > > > >> > > >>>> >> >> >> >> > > > >> > > >>>> >> >> >> Ideally, we'd also have a metric that indicates >> > when >> > > > an >> > > > >> > > unclear >> > > > >> > > >>>> >> leader >> > > > >> > > >>>> >> >> >> election or a recovery happened. It's a bit >> tricky >> > > > >> because >> > > > >> > > the >> > > > >> > > >>>> simple >> > > > >> > > >>>> >> >> >> thing, of tracking it per controller, may be a >> bit >> > > > >> > confusing >> > > > >> > > >>>> during >> > > > >> > > >>>> >> >> >> failovers. >> > > > >> > > >>>> >> >> >> >> > > > >> > > >>>> >> >> >> best, >> > > > >> > > >>>> >> >> >> Colin >> > > > >> > > >>>> >> >> >> >> > > > >> > > >>>> >> >> >> >> > > > >> > > >>>> >> >> >> On Tue, Sep 12, 2023, at 14:25, Colin McCabe >> > wrote: >> > > > >> > > >>>> >> >> >> > Hi Calvin, >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > Thanks for the KIP. I think this is a great >> > > > >> improvement. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> >> Additional High Watermark advance >> requirement >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > Typo: change "advance" to "advancement" >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> >> A bit recap of some key concepts. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > Typo: change "bit" to "quick" >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> >> Ack=1/all produce request. It defines when >> the >> > > > Kafka >> > > > >> > > server >> > > > >> > > >>>> should >> > > > >> > > >>>> >> >> >> respond to the produce request >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > I think this section would be clearer if we >> > talked >> > > > >> about >> > > > >> > > the >> > > > >> > > >>>> new >> > > > >> > > >>>> >> high >> > > > >> > > >>>> >> >> >> > watermark advancement requirement first, and >> > THEN >> > > > >> talked >> > > > >> > > >>>> about its >> > > > >> > > >>>> >> >> >> > impact on acks=0, acks=1, and acks=all. >> > > > acks=all >> > > > >> is >> > > > >> > of >> > > > >> > > >>>> course >> > > > >> > > >>>> >> the >> > > > >> > > >>>> >> >> >> > main case we care about here, so it would be >> > good >> > > to >> > > > >> lead >> > > > >> > > with >> > > > >> > > >>>> >> that, >> > > > >> > > >>>> >> >> >> > rather than delving into the technicalities >> of >> > > > acks=0/1 >> > > > >> > > first. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> >> Unclean recovery >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > So, here you are introducing a new >> > configuration, >> > > > >> > > >>>> >> >> >> > unclean.recovery.strategy. The difficult >> thing >> > > here >> > > > is >> > > > >> > that >> > > > >> > > >>>> there >> > > > >> > > >>>> >> is a >> > > > >> > > >>>> >> >> >> > lot of overlap with >> > > unclean.leader.election.enable. >> > > > So >> > > > >> we >> > > > >> > > >>>> have 3 >> > > > >> > > >>>> >> >> >> > different settings for >> > unclean.recovery.strategy, >> > > > plus >> > > > >> 2 >> > > > >> > > >>>> different >> > > > >> > > >>>> >> >> >> > settings for unclean.leader.election.enable, >> > > giving >> > > > a >> > > > >> > cross >> > > > >> > > >>>> >> product of >> > > > >> > > >>>> >> >> >> > 6 different options. The following "unclean >> > > recovery >> > > > >> > > manager" >> > > > >> > > >>>> >> section >> > > > >> > > >>>> >> >> >> > only applies to one fo those 6 different >> > > > possibilities >> > > > >> (I >> > > > >> > > >>>> think?) >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > I simply don't think we need so many >> different >> > > > election >> > > > >> > > types. >> > > > >> > > >>>> >> Really >> > > > >> > > >>>> >> >> >> > the use-cases we need are people who want NO >> > > unclean >> > > > >> > > >>>> elections, >> > > > >> > > >>>> >> people >> > > > >> > > >>>> >> >> >> > who want "the reasonable thing" and people >> who >> > > want >> > > > >> > > >>>> avaialbility at >> > > > >> > > >>>> >> >> all >> > > > >> > > >>>> >> >> >> > costs. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > Overall, I feel like the first half of the >> KIP >> > is >> > > > about >> > > > >> > the >> > > > >> > > >>>> ELR, >> > > > >> > > >>>> >> and >> > > > >> > > >>>> >> >> >> > the second half is about reworking unclean >> > leader >> > > > >> > > election. It >> > > > >> > > >>>> >> might >> > > > >> > > >>>> >> >> be >> > > > >> > > >>>> >> >> >> > better to move that second half to a separate >> > KIP >> > > so >> > > > >> that >> > > > >> > > we >> > > > >> > > >>>> can >> > > > >> > > >>>> >> >> figure >> > > > >> > > >>>> >> >> >> > it out fully. It should be fine to punt this >> > until >> > > > >> later >> > > > >> > > and >> > > > >> > > >>>> just >> > > > >> > > >>>> >> have >> > > > >> > > >>>> >> >> >> > the current behavior on empty ELR be waiting >> for >> > > the >> > > > >> last >> > > > >> > > >>>> known >> > > > >> > > >>>> >> leader >> > > > >> > > >>>> >> >> >> > to return. After all, that's what we do >> today. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> >> DescribeTopicRequest >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > Is the intention for AdminClient to use this >> RPC >> > > for >> > > > >> > > >>>> >> >> >> > Admin#describeTopics ? If so, we need to >> > describe >> > > > all >> > > > >> of >> > > > >> > > the >> > > > >> > > >>>> >> changes >> > > > >> > > >>>> >> >> to >> > > > >> > > >>>> >> >> >> > the admin client API, as well as changes to >> > > > >> command-line >> > > > >> > > >>>> tools like >> > > > >> > > >>>> >> >> >> > kafka-topics.sh (if there are any). For >> example, >> > > you >> > > > >> will >> > > > >> > > >>>> probably >> > > > >> > > >>>> >> >> need >> > > > >> > > >>>> >> >> >> > changes to TopicDescription.java. You will >> also >> > > > need to >> > > > >> > > >>>> provide >> > > > >> > > >>>> >> all of >> > > > >> > > >>>> >> >> >> > the things that admin client needs -- for >> > example, >> > > > >> > > >>>> >> >> >> > TopicAuthorizedOperations. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > I also don't think the controller should >> serve >> > > this >> > > > >> > > request. >> > > > >> > > >>>> We >> > > > >> > > >>>> >> want >> > > > >> > > >>>> >> >> to >> > > > >> > > >>>> >> >> >> > minimize load on the controller. Just like >> with >> > > the >> > > > >> other >> > > > >> > > >>>> metadata >> > > > >> > > >>>> >> >> >> > requests like MetadataRequest, this should be >> > > > served by >> > > > >> > > >>>> brokers. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > It's a bit confusing why both topic ID and >> topic >> > > > name >> > > > >> are >> > > > >> > > >>>> provided >> > > > >> > > >>>> >> to >> > > > >> > > >>>> >> >> >> > this API. Is the intention that callers >> should >> > set >> > > > one >> > > > >> > but >> > > > >> > > >>>> not the >> > > > >> > > >>>> >> >> >> > other? Or both? This needs to be clarified. >> > Also, >> > > > if we >> > > > >> > do >> > > > >> > > >>>> want to >> > > > >> > > >>>> >> >> >> > support lookups by UUID, that is another >> thing >> > > that >> > > > >> needs >> > > > >> > > to >> > > > >> > > >>>> be >> > > > >> > > >>>> >> added >> > > > >> > > >>>> >> >> >> > to adminclient. >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > In general, I feel like this should also >> > probably >> > > be >> > > > >> its >> > > > >> > > own >> > > > >> > > >>>> KIP >> > > > >> > > >>>> >> since >> > > > >> > > >>>> >> >> >> > it's fairly complex >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > best, >> > > > >> > > >>>> >> >> >> > Colin >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > >> > > > >> > > >>>> >> >> >> > On Thu, Aug 10, 2023, at 15:46, Calvin Liu >> > wrote: >> > > > >> > > >>>> >> >> >> >> Hi everyone, >> > > > >> > > >>>> >> >> >> >> I'd like to discuss a series of enhancement >> to >> > > the >> > > > >> > > >>>> replication >> > > > >> > > >>>> >> >> protocol. >> > > > >> > > >>>> >> >> >> >> >> > > > >> > > >>>> >> >> >> >> A partition replica can experience local >> data >> > > loss >> > > > in >> > > > >> > > unclean >> > > > >> > > >>>> >> >> shutdown >> > > > >> > > >>>> >> >> >> >> scenarios where unflushed data in the OS >> page >> > > > cache is >> > > > >> > > lost >> > > > >> > > >>>> - such >> > > > >> > > >>>> >> >> as an >> > > > >> > > >>>> >> >> >> >> availability zone power outage or a server >> > error. >> > > > The >> > > > >> > > Kafka >> > > > >> > > >>>> >> >> replication >> > > > >> > > >>>> >> >> >> >> protocol is designed to handle these >> situations >> > > by >> > > > >> > > removing >> > > > >> > > >>>> such >> > > > >> > > >>>> >> >> >> replicas >> > > > >> > > >>>> >> >> >> >> from the ISR and only re-adding them once >> they >> > > have >> > > > >> > caught >> > > > >> > > >>>> up and >> > > > >> > > >>>> >> >> >> therefore >> > > > >> > > >>>> >> >> >> >> recovered any lost data. This prevents >> replicas >> > > > that >> > > > >> > lost >> > > > >> > > an >> > > > >> > > >>>> >> >> arbitrary >> > > > >> > > >>>> >> >> >> log >> > > > >> > > >>>> >> >> >> >> suffix, which included committed data, from >> > being >> > > > >> > elected >> > > > >> > > >>>> leader. >> > > > >> > > >>>> >> >> >> >> However, there is a "last replica standing" >> > state >> > > > >> which >> > > > >> > > when >> > > > >> > > >>>> >> combined >> > > > >> > > >>>> >> >> >> with >> > > > >> > > >>>> >> >> >> >> a data loss unclean shutdown event can turn >> a >> > > local >> > > > >> data >> > > > >> > > loss >> > > > >> > > >>>> >> >> scenario >> > > > >> > > >>>> >> >> >> into >> > > > >> > > >>>> >> >> >> >> a global data loss scenario, i.e., committed >> > data >> > > > can >> > > > >> be >> > > > >> > > >>>> removed >> > > > >> > > >>>> >> from >> > > > >> > > >>>> >> >> >> all >> > > > >> > > >>>> >> >> >> >> replicas. When the last replica in the ISR >> > > > experiences >> > > > >> > an >> > > > >> > > >>>> unclean >> > > > >> > > >>>> >> >> >> shutdown >> > > > >> > > >>>> >> >> >> >> and loses committed data, it will be >> reelected >> > > > leader >> > > > >> > > after >> > > > >> > > >>>> >> starting >> > > > >> > > >>>> >> >> up >> > > > >> > > >>>> >> >> >> >> again, causing rejoining followers to >> truncate >> > > > their >> > > > >> > logs >> > > > >> > > and >> > > > >> > > >>>> >> thereby >> > > > >> > > >>>> >> >> >> >> removing the last copies of the committed >> > records >> > > > >> which >> > > > >> > > the >> > > > >> > > >>>> leader >> > > > >> > > >>>> >> >> lost >> > > > >> > > >>>> >> >> >> >> initially. >> > > > >> > > >>>> >> >> >> >> >> > > > >> > > >>>> >> >> >> >> The new KIP will maximize the protection and >> > > > provides >> > > > >> > > >>>> MinISR-1 >> > > > >> > > >>>> >> >> >> tolerance to >> > > > >> > > >>>> >> >> >> >> data loss unclean shutdown events. >> > > > >> > > >>>> >> >> >> >> >> > > > >> > > >>>> >> >> >> >> >> > > > >> > > >>>> >> >> >> >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas >> > > > >> > > >>>> >> >> >> >> > > > >> > > >>>> >> >> >> > > > >> > > >>>> >> >> > > > >> > > >>>> >> > > > >> > > >>> >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> >