Hi Calvin, Thanks for the explanations. I like the idea of using none, balanced, aggressive. We also had an offline discussion about why it is good to use a new config key (basically, so that we can deprecate the old one which had only false/true values in 4.0) With these changes, I am +1.
best, Colin On Mon, Sep 18, 2023, at 15:54, Calvin Liu wrote: > Hi Colin, > Also, can we deprecate unclean.leader.election.enable in 4.0? Before that, > we can have both the config unclean.recovery.strategy and > unclean.leader.election.enable > and using the unclean.recovery.Enabled to determine which config to use > during the unclean leader election. > > On Mon, Sep 18, 2023 at 3:51 PM Calvin Liu <ca...@confluent.io> wrote: > >> Hi Colin, >> For the unclean.recovery.strategy config name, how about we use the >> following >> None. It basically means no unclean recovery will be performed. >> Aggressive. It means availability goes first. Whenever the partition can't >> elect a durable replica, the controller will try the unclean recovery. >> Balanced. It is the balance point of the availability first(Aggressive) >> and least availability(None). The controller performs unclean recovery when >> both ISR and ELR are empty. >> >> >> On Fri, Sep 15, 2023 at 11:42 AM Calvin Liu <ca...@confluent.io> wrote: >> >>> Hi Colin, >>> >>> > So, the proposal is that if someone sets "unclean.leader.election.enable >>> = true"... >>> >>> >>> The idea is to use one of the unclean.leader.election.enable and >>> unclean.recovery.strategy based on the unclean.recovery.Enabled. A possible >>> version can be >>> >>> If unclean.recovery.Enabled: >>> >>> { >>> >>> Check unclean.recovery.strategy. If set, use it. Otherwise, check >>> unclean.leader.election.enable and translate it to >>> unclean.recovery.strategy. >>> >>> } else { >>> >>> Use unclean.leader.election.enable >>> >>> } >>> >>> >>> —-------- >>> >>> >The configuration key should be "unclean.recovery.manager.enabled", >>> right? >>> >>> >>> I think we have two ways of choosing a leader uncleanly, unclean leader >>> election and unclean recovery(log inspection) and we try to switch between >>> them. >>> >>> Do you mean we want to develop two ways of performing the unclean >>> recovery and one of them is using “unclean recovery manager”? I guess we >>> haven’t discussed the second way. >>> >>> >>> —------- >>> >>> >How do these 4 levels of overrides interact with your new >>> configurations? >>> >>> >>> I do notice in the Kraft controller code, the method to check whether >>> perform unclean leader election is hard coded to false since >>> 2021(uncleanLeaderElectionEnabledForTopic). Isn’t it a good chance to >>> completely deprecate the unclean.leader.election.enable? We don’t even have >>> to worry about the config conversion. >>> >>> On the other hand, whatever the override is, as long as the controller >>> can have the final effective unclean.leader.election.enable, the topic >>> level config unclean.recovery.strategy, the cluster level config >>> unclean.recovery.Enabled, the controller can calculate the correct methods >>> to use right? >>> >>> >>> On Fri, Sep 15, 2023 at 10:02 AM Colin McCabe <cmcc...@apache.org> wrote: >>> >>>> On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote: >>>> > Hi Colin >>>> > 1. I think using the new config name is more clear. >>>> > a. The unclean leader election is actually removed if unclean >>>> > recovery is in use. >>>> > b. Using multiple values in unclean.leader.election.enable is >>>> > confusing and it will be more confusing after people forget about this >>>> > discussion. >>>> >>>> Hi Calvin, >>>> >>>> So, the proposal is that if someone sets "unclean.leader.election.enable >>>> = true" but then sets one of your new configurations, the value of >>>> unclean.leader.election.enable is ignored? That seems less clear to me, not >>>> more. Just in general, having multiple configuration keys to control the >>>> same thing confuses users. Basically, they are sitting at a giant control >>>> panel, and some of the levers do nothing. >>>> >>>> > 2. Sorry I forgot to mention in the response that I did add the >>>> > unclean.recovery.Enabled flag. >>>> >>>> The configuration key should be "unclean.recovery.manager.enabled", >>>> right? Becuase we can do "unclean recovery" without the manager. Disabling >>>> the manager just means we use a different mechanism for recovery. >>>> >>>> > c. Maybe I underestimated the challenge of replacing the >>>> config. Any >>>> > implementation problems ahead? >>>> >>>> There are four levels of overrides for unclean.leader.election.enable. >>>> >>>> 1. static configuration for node. >>>> This goes in the configuration file, typically named >>>> server.properties >>>> >>>> 2. dynamic configuration for node default >>>> ConfigResource(type=BROKER, name="") >>>> >>>> 3. dynamic configuration for node >>>> ConfigResource(type=BROKER, name=<controller id>) >>>> >>>> 4. dynamic configuration for topic >>>> ConfigResource(type=TOPIC, name=<topic-name>) >>>> >>>> How do these 4 levels of overrides interact with your new >>>> configurations? If the new configurations dominate over the old ones, it >>>> seems like this will get a lot more confusing to implement (and also to >>>> use.) >>>> >>>> Again, I'd recommend just adding some new values to >>>> unclean.leader.election.enable. It's simple and will prevent user confusion >>>> (as well as developer confusion.) >>>> >>>> best, >>>> Colin >>>> >>>> >>>> > 3. About the admin client, I mentioned 3 changes in the client. >>>> Anything >>>> > else I missed in the KIP? >>>> > a. The client will switch to using the new RPC instead of >>>> > MetadataRequest for the topics. >>>> > b. The TopicPartitionInfo used in TopicDescription needs to add >>>> new >>>> > fields related to the ELR. >>>> > c. The outputs will add the ELR related fields. >>>> > >>>> > On Thu, Sep 14, 2023 at 9:19 PM Colin McCabe <cmcc...@apache.org> >>>> wrote: >>>> > >>>> >> Hi Calvin, >>>> >> >>>> >> Thanks for the changes. >>>> >> >>>> >> 1. Earlier I commented that creating "unclean.recovery.strategy " is >>>> not >>>> >> necessary, and we can just reuse the existing >>>> >> "unclean.leader.election.enable" configuration key. Let's discuss >>>> that. >>>> >> >>>> >> 2.I also don't understand why you didn't add a configuration to >>>> enable or >>>> >> disable the Unclean Recovery Manager. This seems like a very simple >>>> way to >>>> >> handle the staging issue which we discussed. The URM can just be >>>> turned off >>>> >> until it is production ready. Let's discuss this. >>>> >> >>>> >> 3. You still need to describe the changes to AdminClient that are >>>> needed >>>> >> to use DescribeTopicRequest. >>>> >> >>>> >> Keep at it. It's looking better. :) >>>> >> >>>> >> best, >>>> >> Colin >>>> >> >>>> >> >>>> >> On Thu, Sep 14, 2023, at 11:03, Calvin Liu wrote: >>>> >> > Hi Colin >>>> >> > Thanks for the comments! >>>> >> > >>>> >> > I did the following changes >>>> >> > >>>> >> > 1. >>>> >> > >>>> >> > Simplified the API spec section to only include the diff. >>>> >> > 2. >>>> >> > >>>> >> > Reordered the HWM requirement section. >>>> >> > 3. >>>> >> > >>>> >> > Removed the URM implementation details to keep the necessary >>>> >> > characteristics to perform the unclean recovery. >>>> >> > 1. >>>> >> > >>>> >> > When to perform the unclean recovery >>>> >> > 2. >>>> >> > >>>> >> > Under different config, how the unclean recovery finds the >>>> leader. >>>> >> > 3. >>>> >> > >>>> >> > How the config unclean.leader.election.enable and >>>> >> > unclean.recovery.strategy are converted when users >>>> enable/disable >>>> >> the >>>> >> > unclean recovery. >>>> >> > 4. >>>> >> > >>>> >> > More details about how we change admin client. >>>> >> > 5. >>>> >> > >>>> >> > API limits on the GetReplicaLogInfoRequest and >>>> DescribeTopicRequest. >>>> >> > 6. >>>> >> > >>>> >> > Two metrics added >>>> >> > 1. >>>> >> > >>>> >> > Kafka.controller.global_under_min_isr_partition_count >>>> >> > 2. >>>> >> > >>>> >> > kafka.controller.unclean_recovery_finished_count >>>> >> > >>>> >> > >>>> >> > On Wed, Sep 13, 2023 at 10:46 AM Colin McCabe <cmcc...@apache.org> >>>> >> wrote: >>>> >> > >>>> >> >> On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote: >>>> >> >> > Hi Colin >>>> >> >> > Thanks for the comments! >>>> >> >> > >>>> >> >> >>>> >> >> Hi Calvin, >>>> >> >> >>>> >> >> Thanks again for the KIP. >>>> >> >> >>>> >> >> One meta-comment: it's usually better to just do a diff on a >>>> message >>>> >> spec >>>> >> >> file or java file if you're including changes to it in the KIP. >>>> This is >>>> >> >> easier to read than looking for "new fields begin" etc. in the >>>> text, and >>>> >> >> gracefully handles the case where existing fields were changed. >>>> >> >> >>>> >> >> > Rewrite the Additional High Watermark advancement requirement >>>> >> >> > There was feedback on this section that some readers may not be >>>> >> familiar >>>> >> >> > with HWM and Ack=0,1,all requests. This can help them understand >>>> the >>>> >> >> > proposal. I will rewrite this part for more readability. >>>> >> >> > >>>> >> >> >>>> >> >> To be clear, I wasn't suggesting dropping either section. I agree >>>> that >>>> >> >> they add useful background. I was just suggesting that we should >>>> discuss >>>> >> >> the "acks" setting AFTER discussing the new high watermark >>>> advancement >>>> >> >> conditions. We also should discuss acks=0. While it isn't >>>> conceptually >>>> >> much >>>> >> >> different than acks=1 here, its omission from this section is >>>> confusing. >>>> >> >> >>>> >> >> > Unclean recovery >>>> >> >> > >>>> >> >> > The plan is to replace the unclean.leader.election.enable with >>>> >> >> > unclean.recovery.strategy. If the Unclean Recovery is enabled >>>> then it >>>> >> >> deals >>>> >> >> > with the three options in the unclean.recovery.strategy. >>>> >> >> > >>>> >> >> > >>>> >> >> > Let’s refine the Unclean Recovery. We have already taken a lot of >>>> >> >> > suggestions and I hope to enhance the durability of Kafka to the >>>> next >>>> >> >> level >>>> >> >> > with this KIP. >>>> >> >> >>>> >> >> I am OK with doing the unclean leader recovery improvements in >>>> this KIP. >>>> >> >> However, I think we need to really work on the configuration >>>> settings. >>>> >> >> >>>> >> >> Configuration overrides are often quite messy. For example, the >>>> cases >>>> >> >> where we have log.roll.hours and log.roll.segment.ms, the user >>>> has to >>>> >> >> remember which one takes precedence, and it is not obvious. So, >>>> rather >>>> >> than >>>> >> >> creating a new configuration, why not add additional values to >>>> >> >> "unclean.leader.election.enable"? I think this will be simpler for >>>> >> people >>>> >> >> to understand, and simpler in the code as well. >>>> >> >> >>>> >> >> What if we continued to use "unclean.leader.election.enable" but >>>> >> extended >>>> >> >> it so that it took a string? Then the string could have these >>>> values: >>>> >> >> >>>> >> >> never >>>> >> >> never automatically do an unclean leader election under any >>>> >> conditions >>>> >> >> >>>> >> >> false / default >>>> >> >> only do an unclean leader election if there may be possible >>>> data >>>> >> loss >>>> >> >> >>>> >> >> true / always >>>> >> >> always do an unclean leader election if we can't immediately >>>> elect a >>>> >> >> leader >>>> >> >> >>>> >> >> It's a bit awkward that false maps to default rather than to >>>> never. But >>>> >> >> this awkwardness exists if we use two different configuration keys >>>> as >>>> >> well. >>>> >> >> The reason for the awkwardness is that we simply don't want most >>>> of the >>>> >> >> people currently setting unclean.leader.election.enable=false to >>>> get the >>>> >> >> "never" behavior. We have to bite that bullet. Better to be clear >>>> and >>>> >> >> explicit than hide it. >>>> >> >> >>>> >> >> Another thing that's a bit awkward is having two different ways to >>>> do >>>> >> >> unclean leader election specified in the KIP. You descirbe two >>>> methods: >>>> >> the >>>> >> >> simple "choose the last leader" method, and the "unclean recovery >>>> >> manager" >>>> >> >> method. I understand why you did it this way -- "choose the last >>>> >> leader" is >>>> >> >> simple, and will help us deliver an implementation quickly, while >>>> the >>>> >> URM >>>> >> >> is preferable in the long term. My suggestion here is to separate >>>> the >>>> >> >> decision of HOW to do unclean leader election from the decision of >>>> WHEN >>>> >> to >>>> >> >> do it. >>>> >> >> >>>> >> >> So in other words, have "unclean.leader.election.enable" specify >>>> when we >>>> >> >> do unclean leader election, and have a new configuration like >>>> >> >> "unclean.recovery.manager.enable" to determine if we use the URM. >>>> >> >> Presumably the URM will take some time to get fully stable, so >>>> this can >>>> >> >> default to false for a while, and we can flip the default to true >>>> when >>>> >> we >>>> >> >> feel ready. >>>> >> >> >>>> >> >> The URM is somewhat under-described here. I think we need a few >>>> >> >> configurations here for it. For example, we need a configuration to >>>> >> specify >>>> >> >> how long it should wait for a broker to respond to its RPCs before >>>> >> moving >>>> >> >> on. We also need to understand how the URM interacts with >>>> >> >> unclean.leader.election.enable=always. I assume that with "always" >>>> we >>>> >> will >>>> >> >> just unconditionally use the URM rather than choosing randomly. >>>> But this >>>> >> >> should be spelled out in the KIP. >>>> >> >> >>>> >> >> > >>>> >> >> > DescribeTopicRequest >>>> >> >> > >>>> >> >> > 1. >>>> >> >> > Yes, the plan is to replace the MetadataRequest with the >>>> >> >> > DescribeTopicRequest for the admin clients. Will check the >>>> details. >>>> >> >> >>>> >> >> Sounds good. But as I said, you need to specify how AdminClient >>>> >> interacts >>>> >> >> with the new request. This will involve adding some fields to >>>> >> >> TopicDescription.java. And you need to specify the changes to the >>>> >> >> kafka-topics.sh command line tool. Otherwise we cannot use the >>>> tool to >>>> >> see >>>> >> >> the new information. >>>> >> >> >>>> >> >> The new requests, DescribeTopicRequest and >>>> GetReplicaLogInfoRequest, >>>> >> need >>>> >> >> to have limits placed on them so that their size can't be >>>> infinite. We >>>> >> >> don't want to propagate the current problems of MetadataRequest, >>>> where >>>> >> >> clients can request massive responses that can mess up the JVM when >>>> >> handled. >>>> >> >> >>>> >> >> Adding limits is simple for GetReplicaLogInfoRequest -- we can >>>> just say >>>> >> >> that only 2000 partitions at a time can be requested. For >>>> >> >> DescribeTopicRequest we can probably just limit to 20 topics or >>>> >> something >>>> >> >> like that, to avoid the complexity of doing pagination in this KIP. >>>> >> >> >>>> >> >> > 2. >>>> >> >> > I can let the broker load the ELR info so that they can serve >>>> the >>>> >> >> > DescribeTopicRequest as well. >>>> >> >> >>>> >> >> Yes, it's fine to add to MetadataCache. In fact, you'll be loading >>>> it >>>> >> >> anyway once it's added to PartitionImage. >>>> >> >> >>>> >> >> > 3. >>>> >> >> > Yeah, it does not make sense to have the topic id if >>>> >> >> > DescribeTopicRequest is only used by the admin client. >>>> >> >> >>>> >> >> OK. That makes things simpler. We can always create a new API later >>>> >> >> (hopefully not in this KIP!) to query by topic ID. >>>> >> >> >>>> >> >> > >>>> >> >> > >>>> >> >> > Metrics >>>> >> >> > >>>> >> >> > As for overall cluster health metrics, I think under-min-ISR is >>>> still >>>> >> a >>>> >> >> > useful one. ELR is more like a safety belt. When the ELR is >>>> used, the >>>> >> >> > cluster availability has already been impacted. >>>> >> >> > >>>> >> >> > Maybe we can have a metric to count the partitions that sum(ISR, >>>> ELR) >>>> >> < >>>> >> >> min >>>> >> >> > ISR. What do you think? >>>> >> >> >>>> >> >> How about: >>>> >> >> >>>> >> >> A. a metric for the totoal number of under-min-isr partitions? We >>>> don't >>>> >> >> have that in Apache Kafka at the moment. >>>> >> >> >>>> >> >> B. a metric for the number of unclean leader elections we did (for >>>> >> >> simplicity, it can reset to 0 on controller restart: we expect >>>> people to >>>> >> >> monitor the change over time anyway) >>>> >> >> >>>> >> >> best, >>>> >> >> Colin >>>> >> >> >>>> >> >> >>>> >> >> > >>>> >> >> > Yeah, for the ongoing unclean recoveries, the controller can >>>> keep an >>>> >> >> > accurate count through failover because partition registration >>>> can >>>> >> >> indicate >>>> >> >> > whether a recovery is needed. However, for the happened ones, >>>> unless >>>> >> we >>>> >> >> > want to persist the number somewhere, we can only figure it out >>>> from >>>> >> the >>>> >> >> > log. >>>> >> >> > >>>> >> >> > On Tue, Sep 12, 2023 at 3:16 PM Colin McCabe <cmcc...@apache.org >>>> > >>>> >> wrote: >>>> >> >> > >>>> >> >> >> Also, we should have metrics that show what is going on with >>>> regard >>>> >> to >>>> >> >> the >>>> >> >> >> eligible replica set. I'm not sure exactly what to suggest, but >>>> >> >> something >>>> >> >> >> that could identify when things are going wrong in the clsuter. >>>> >> >> >> >>>> >> >> >> For example, maybe a metric for partitions containing replicas >>>> that >>>> >> are >>>> >> >> >> ineligible to be leader? That would show a spike when a broker >>>> had an >>>> >> >> >> unclean restart. >>>> >> >> >> >>>> >> >> >> Ideally, we'd also have a metric that indicates when an unclear >>>> >> leader >>>> >> >> >> election or a recovery happened. It's a bit tricky because the >>>> simple >>>> >> >> >> thing, of tracking it per controller, may be a bit confusing >>>> during >>>> >> >> >> failovers. >>>> >> >> >> >>>> >> >> >> best, >>>> >> >> >> Colin >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> On Tue, Sep 12, 2023, at 14:25, Colin McCabe wrote: >>>> >> >> >> > Hi Calvin, >>>> >> >> >> > >>>> >> >> >> > Thanks for the KIP. I think this is a great improvement. >>>> >> >> >> > >>>> >> >> >> >> Additional High Watermark advance requirement >>>> >> >> >> > >>>> >> >> >> > Typo: change "advance" to "advancement" >>>> >> >> >> > >>>> >> >> >> >> A bit recap of some key concepts. >>>> >> >> >> > >>>> >> >> >> > Typo: change "bit" to "quick" >>>> >> >> >> > >>>> >> >> >> >> Ack=1/all produce request. It defines when the Kafka server >>>> should >>>> >> >> >> respond to the produce request >>>> >> >> >> > >>>> >> >> >> > I think this section would be clearer if we talked about the >>>> new >>>> >> high >>>> >> >> >> > watermark advancement requirement first, and THEN talked >>>> about its >>>> >> >> >> > impact on acks=0, acks=1, and acks=all. acks=all is of >>>> course >>>> >> the >>>> >> >> >> > main case we care about here, so it would be good to lead with >>>> >> that, >>>> >> >> >> > rather than delving into the technicalities of acks=0/1 first. >>>> >> >> >> > >>>> >> >> >> >> Unclean recovery >>>> >> >> >> > >>>> >> >> >> > So, here you are introducing a new configuration, >>>> >> >> >> > unclean.recovery.strategy. The difficult thing here is that >>>> there >>>> >> is a >>>> >> >> >> > lot of overlap with unclean.leader.election.enable. So we >>>> have 3 >>>> >> >> >> > different settings for unclean.recovery.strategy, plus 2 >>>> different >>>> >> >> >> > settings for unclean.leader.election.enable, giving a cross >>>> >> product of >>>> >> >> >> > 6 different options. The following "unclean recovery manager" >>>> >> section >>>> >> >> >> > only applies to one fo those 6 different possibilities (I >>>> think?) >>>> >> >> >> > >>>> >> >> >> > I simply don't think we need so many different election types. >>>> >> Really >>>> >> >> >> > the use-cases we need are people who want NO unclean >>>> elections, >>>> >> people >>>> >> >> >> > who want "the reasonable thing" and people who want >>>> avaialbility at >>>> >> >> all >>>> >> >> >> > costs. >>>> >> >> >> > >>>> >> >> >> > Overall, I feel like the first half of the KIP is about the >>>> ELR, >>>> >> and >>>> >> >> >> > the second half is about reworking unclean leader election. It >>>> >> might >>>> >> >> be >>>> >> >> >> > better to move that second half to a separate KIP so that we >>>> can >>>> >> >> figure >>>> >> >> >> > it out fully. It should be fine to punt this until later and >>>> just >>>> >> have >>>> >> >> >> > the current behavior on empty ELR be waiting for the last >>>> known >>>> >> leader >>>> >> >> >> > to return. After all, that's what we do today. >>>> >> >> >> > >>>> >> >> >> >> DescribeTopicRequest >>>> >> >> >> > >>>> >> >> >> > Is the intention for AdminClient to use this RPC for >>>> >> >> >> > Admin#describeTopics ? If so, we need to describe all of the >>>> >> changes >>>> >> >> to >>>> >> >> >> > the admin client API, as well as changes to command-line >>>> tools like >>>> >> >> >> > kafka-topics.sh (if there are any). For example, you will >>>> probably >>>> >> >> need >>>> >> >> >> > changes to TopicDescription.java. You will also need to >>>> provide >>>> >> all of >>>> >> >> >> > the things that admin client needs -- for example, >>>> >> >> >> > TopicAuthorizedOperations. >>>> >> >> >> > >>>> >> >> >> > I also don't think the controller should serve this request. >>>> We >>>> >> want >>>> >> >> to >>>> >> >> >> > minimize load on the controller. Just like with the other >>>> metadata >>>> >> >> >> > requests like MetadataRequest, this should be served by >>>> brokers. >>>> >> >> >> > >>>> >> >> >> > It's a bit confusing why both topic ID and topic name are >>>> provided >>>> >> to >>>> >> >> >> > this API. Is the intention that callers should set one but >>>> not the >>>> >> >> >> > other? Or both? This needs to be clarified. Also, if we do >>>> want to >>>> >> >> >> > support lookups by UUID, that is another thing that needs to >>>> be >>>> >> added >>>> >> >> >> > to adminclient. >>>> >> >> >> > >>>> >> >> >> > In general, I feel like this should also probably be its own >>>> KIP >>>> >> since >>>> >> >> >> > it's fairly complex >>>> >> >> >> > >>>> >> >> >> > best, >>>> >> >> >> > Colin >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> > On Thu, Aug 10, 2023, at 15:46, Calvin Liu wrote: >>>> >> >> >> >> Hi everyone, >>>> >> >> >> >> I'd like to discuss a series of enhancement to the >>>> replication >>>> >> >> protocol. >>>> >> >> >> >> >>>> >> >> >> >> A partition replica can experience local data loss in unclean >>>> >> >> shutdown >>>> >> >> >> >> scenarios where unflushed data in the OS page cache is lost >>>> - such >>>> >> >> as an >>>> >> >> >> >> availability zone power outage or a server error. The Kafka >>>> >> >> replication >>>> >> >> >> >> protocol is designed to handle these situations by removing >>>> such >>>> >> >> >> replicas >>>> >> >> >> >> from the ISR and only re-adding them once they have caught >>>> up and >>>> >> >> >> therefore >>>> >> >> >> >> recovered any lost data. This prevents replicas that lost an >>>> >> >> arbitrary >>>> >> >> >> log >>>> >> >> >> >> suffix, which included committed data, from being elected >>>> leader. >>>> >> >> >> >> However, there is a "last replica standing" state which when >>>> >> combined >>>> >> >> >> with >>>> >> >> >> >> a data loss unclean shutdown event can turn a local data loss >>>> >> >> scenario >>>> >> >> >> into >>>> >> >> >> >> a global data loss scenario, i.e., committed data can be >>>> removed >>>> >> from >>>> >> >> >> all >>>> >> >> >> >> replicas. When the last replica in the ISR experiences an >>>> unclean >>>> >> >> >> shutdown >>>> >> >> >> >> and loses committed data, it will be reelected leader after >>>> >> starting >>>> >> >> up >>>> >> >> >> >> again, causing rejoining followers to truncate their logs and >>>> >> thereby >>>> >> >> >> >> removing the last copies of the committed records which the >>>> leader >>>> >> >> lost >>>> >> >> >> >> initially. >>>> >> >> >> >> >>>> >> >> >> >> The new KIP will maximize the protection and provides >>>> MinISR-1 >>>> >> >> >> tolerance to >>>> >> >> >> >> data loss unclean shutdown events. >>>> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas >>>> >> >> >> >>>> >> >> >>>> >> >>>> >>>