Hi Stanislav, Thanks for the comments. The proposal we are making is not about optimizing Big-O but instead provide a simpler way of stopping a broker becoming leader. If we want to go with making this an option and providing a tool which abstracts moving the broker to end preferred leader list , it needs to do it for all the partitions that broker is leader for. As said in the above comment a broker i.e leader for 1000 partitions we have to this for all the partitions. Instead of having a blacklist will help simplify this process and we can provide monitoring/alerts on such list.
"This sounds like a bit of a hack. If that is the concern, why not propose a KIP that addresses the specific issue?" Do you mind shedding some light what issue you are talking to propose a KIP for? Replication is a challenge when we are bringing up a new node. If you have retention period of 3 days there is honestly no way to do it via online replication without taking a hit on latency SLAs. Is your ask to find a way to fix the replication itself when we are bringing a new broker from no data. "Having a blacklist you control still seems like a workaround given that Kafka itself knows when the topic retention would allow you to switch that replica to a leader" Not sure how its making it any complicated by having a single zk path to have a list of brokers. Thanks, Harsha On Mon, Sep 09, 2019 at 3:55 PM, Stanislav Kozlovski < stanis...@confluent.io > wrote: > > > > I agree with Colin that the same result should be achievable through > proper abstraction in a tool. Even if that might be "4xO(N)" operations, > that is still not a lot - it is still classified as O(N) > > > > Let's say a healthy broker hosting 3000 partitions, and of which 1000 are > > >> >> >> the preferred leaders (leader count is 1000). There is a hardware failure >> (disk/memory, etc.), and kafka process crashed. We swap this host with >> another host but keep the same broker. id ( http://broker.id/ ) , when this >> new broker coming up, it has no historical data, and we manage to have the >> current last offsets of all partitions set in the >> replication-offset-checkpoint (if we don't set them, it could cause crazy >> ReplicaFetcher pulling of historical data from other brokers and cause >> cluster high latency and other instabilities), so when Kafka is brought >> up, it is quickly catching up as followers in the ISR. Note, we have >> auto.leader.rebalance.enable disabled, so it's not serving any traffic as >> leaders (leader count = 0), even there are 1000 partitions that this >> broker is the Preferred Leader. We need to make this broker not serving >> traffic for a few hours or days depending on the SLA of the topic >> retention requirement until after it's having enough historical data. >> >> > > > > This sounds like a bit of a hack. If that is the concern, why not propose > a KIP that addresses the specific issue? Having a blacklist you control > still seems like a workaround given that Kafka itself knows when the topic > retention would allow you to switch that replica to a leader > > > > I really hope we can come up with a solution that avoids complicating the > controller and state machine logic further. > Could you please list out the main drawbacks of abstract this away in the > reassignments tool (or a new tool)? > > > > On Mon, Sep 9, 2019 at 7:53 AM Colin McCabe < cmccabe@ apache. org ( > cmcc...@apache.org ) > wrote: > > >> >> >> On Sat, Sep 7, 2019, at 09:21, Harsha Chintalapani wrote: >> >> >>> >>> >>> Hi Colin, >>> Can you give us more details on why you don't want this to be part of the >>> Kafka core. You are proposing KIP-500 which will take away zookeeper and >>> writing this interim tools to change the zookeeper metadata doesn't make >>> sense to me. >>> >>> >> >> >> >> Hi Harsha, >> >> >> >> The reassignment API described in KIP-455, which will be part of Kafka >> 2.4, doesn't rely on ZooKeeper. This API will stay the same after KIP-500 >> is implemented. >> >> >>> >>> >>> As George pointed out there are >>> several benefits having it in the system itself instead of asking users to >>> hack bunch of json files to deal with outage scenario. >>> >>> >> >> >> >> In both cases, the user just has to run a shell command, right? In both >> cases, the user has to remember to undo the command later when they want >> the broker to be treated normally again. And in both cases, the user >> should probably be running an external rebalancing tool to avoid having to >> run these commands manually. :) >> >> >> >> best, >> Colin >> >> >>> >>> >>> Thanks, >>> Harsha >>> >>> >>> >>> On Fri, Sep 6, 2019 at 4:36 PM George Li < sql_consulting@ yahoo. com ( >>> sql_consult...@yahoo.com ) >>> >>> >> >> >> >> .invalid> >> >> >>> >>> >>> wrote: >>> >>> >>>> >>>> >>>> Hi Colin, >>>> >>>> >>>> >>>> Thanks for the feedback. The "separate set of metadata about >>>> >>>> >>> >>> >> >> >> >> blacklists" >> >> >>> >>>> >>>> >>>> in KIP-491 is just the list of broker ids. Usually 1 or 2 or a couple >>>> >>>> >>> >>> >> >> >> >> in >> >> >>> >>>> >>>> >>>> the cluster. Should be easier than keeping json files? e.g. what if >>>> >>>> >>> >>> >> >> >> >> we >> >> >>> >>>> >>>> >>>> first blacklist broker_id_1, then another broker_id_2 has issues, and >>>> >>>> >>> >>> >> >> >> >> we >> >> >>> >>>> >>>> >>>> need to write out another json file to restore later (and in which >>>> >>>> >>> >>> >> >> >> >> order)? >> >> >>> >>>> >>>> >>>> Using blacklist, we can just add the broker_id_2 to the existing one. >>>> >>>> >>> >>> >> >> >> >> and >> >> >>> >>>> >>>> >>>> remove whatever broker_id returning to good state without worrying >>>> >>>> >>> >>> >> >> >> >> how(the >> >> >>> >>>> >>>> >>>> ordering of putting the broker to blacklist) to restore. >>>> >>>> >>>> >>>> For topic level config, the blacklist will be tied to topic/partition(e.g. >>>> Configs: >>>> topic.preferred.leader.blacklist=0:101,102;1:103 where 0 & 1 is the >>>> partition#, 101,102,103 are the blacklist broker_ids), and easier to >>>> update/remove, no need for external json files? >>>> >>>> >>>> >>>> Thanks, >>>> George >>>> >>>> >>>> >>>> On Friday, September 6, 2019, 02:20:33 PM PDT, Colin McCabe < cmccabe@ >>>> apache. >>>> org ( cmcc...@apache.org ) > wrote: >>>> >>>> >>>> >>>> One possibility would be writing a new command-line tool that would >>>> deprioritize a given replica using the new KIP-455 API. Then it could >>>> write out a JSON files containing the old priorities, which could be >>>> restored when (or if) we needed to do so. This seems like it might be >>>> simpler and easier to maintain than a separate set of metadata about >>>> blacklists. >>>> >>>> >>>> >>>> best, >>>> Colin >>>> >>>> >>>> >>>> On Fri, Sep 6, 2019, at 11:58, George Li wrote: >>>> >>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> Just want to ping and bubble up the discussion of KIP-491. >>>>> >>>>> >>>>> >>>>> On a large scale of Kafka clusters with thousands of brokers in many >>>>> clusters. Frequent hardware failures are common, although the >>>>> reassignments to change the preferred leaders is a workaround, it incurs >>>>> unnecessary additional work than the proposed preferred leader blacklist >>>>> in KIP-491, and hard to scale. >>>>> >>>>> >>>>> >>>>> I am wondering whether others using Kafka in a big scale running into same >>>>> problem. >>>>> >>>>> >>>>> >>>>> Satish, >>>>> >>>>> >>>>> >>>>> Regarding your previous question about whether there is use-case for >>>>> TopicLevel preferred leader "blacklist", I thought about one use-case: to >>>>> improve rebalance/reassignment, the large partition >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> will >> >> >>> >>>> >>>>> >>>>> >>>>> usually cause performance/stability issues, planning to change the >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> say >> >> >>> >>>> >>>>> >>>>> >>>>> the New Replica will start with Leader's latest offset(this way the >>>>> replica is almost instantly in the ISR and reassignment completed), >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> and >> >> >>> >>>> >>>>> >>>>> >>>>> put this partition's NewReplica into Preferred Leader "Blacklist" at the >>>>> Topic Level config for that partition. After sometime(retention time), >>>>> this new replica has caught up and ready to serve traffic, update/remove >>>>> the TopicConfig for this partition's preferred leader blacklist. >>>>> >>>>> >>>>> >>>>> I will update the KIP-491 later for this use case of Topic Level >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> config >> >> >>> >>>> >>>>> >>>>> >>>>> for Preferred Leader Blacklist. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> George >>>>> >>>>> >>>>> >>>>> On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li >>>>> < sql_consulting@ yahoo. com ( sql_consult...@yahoo.com ) > wrote: >>>>> >>>>> >>>>> >>>>> Hi Colin, >>>>> >>>>> >>>>>> >>>>>> >>>>>> In your example, I think we're comparing apples and oranges. You >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> started by outlining a scenario where "an empty broker... comes up... >>>> [without] any > leadership[s]." But then you criticize using >>>> >>>> >>> >>> >> >> >> >> reassignment >> >> >>> >>>> >>>> >>>> to switch the order of preferred replicas because it "would not >>>> >>>> >>> >>> >> >> >> >> actually >> >> >>> >>>> >>>> >>>> switch the leader > automatically." If the empty broker doesn't have >>>> >>>> >>> >>> >> >> >> >> any >> >> >>> >>>> >>>> >>>> leaderships, there is nothing to be switched, right? >>>> >>>> >>>>> >>>>> >>>>> Let me explained in details of this particular use case example for >>>>> comparing apples to apples. >>>>> >>>>> >>>>> >>>>> Let's say a healthy broker hosting 3000 partitions, and of which 1000 are >>>>> the preferred leaders (leader count is 1000). There is a hardware failure >>>>> (disk/memory, etc.), and kafka process crashed. We swap this host with >>>>> another host but keep the same broker. id ( http://broker.id/ ) , when >>>>> this >>>>> new broker coming up, it has no historical data, and we manage to have >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>>> >>>>> >>>>> current last offsets of all partitions set in >>>>> the replication-offset-checkpoint (if we don't set them, it could >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> cause >> >> >>> >>>> >>>>> >>>>> >>>>> crazy ReplicaFetcher pulling of historical data from other brokers >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> and >> >> >>> >>>> >>>>> >>>>> >>>>> cause cluster high latency and other instabilities), so when Kafka is >>>>> brought up, it is quickly catching up as followers in the ISR. Note, we >>>>> have auto.leader.rebalance.enable disabled, so it's not serving >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> any >> >> >>> >>>> >>>>> >>>>> >>>>> traffic as leaders (leader count = 0), even there are 1000 partitions that >>>>> this broker is the Preferred Leader. >>>>> >>>>> >>>>> >>>>> We need to make this broker not serving traffic for a few hours or >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> days >> >> >>> >>>> >>>>> >>>>> >>>>> depending on the SLA of the topic retention requirement until after it's >>>>> having enough historical data. >>>>> >>>>> >>>>> >>>>> * The traditional way using the reassignments to move this broker in that >>>>> 1000 partitions where it's the preferred leader to the end of assignment, >>>>> this is O(N) operation. and from my experience, we can't submit all 1000 >>>>> at the same time, otherwise cause higher latencies >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> even >> >> >>> >>>> >>>>> >>>>> >>>>> the reassignment in this case can complete almost instantly. After >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> a >> >> >>> >>>> >>>>> >>>>> >>>>> few hours/days whatever, this broker is ready to serve traffic, we have to >>>>> run reassignments again to restore that 1000 partitions preferred leaders >>>>> for this broker: O(N) operation. then run >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> preferred >> >> >>> >>>> >>>>> >>>>> >>>>> leader election O(N) again. So total 3 x O(N) operations. The point is >>>>> since the new empty broker is expected to be the same as the old >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> one >> >> >>> >>>> >>>>> >>>>> >>>>> in terms of hosting partition/leaders, it would seem unnecessary to >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> do >> >> >>> >>>> >>>>> >>>>> >>>>> reassignments (ordering of replica) during the broker catching up >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> time. >> >> >>> >>>> >>>>> >>>>> >>>>> * The new feature Preferred Leader "Blacklist": just need to put a dynamic >>>>> config to indicate that this broker should be considered >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> leader >> >> >>> >>>> >>>>> >>>>> >>>>> (preferred leader election or broker failover or unclean leader election) >>>>> to the lowest priority. NO need to run any reassignments. After a few >>>>> hours/days, when this broker is ready, remove the dynamic config, and run >>>>> preferred leader election and this broker will serve traffic for that 1000 >>>>> original partitions it was the preferred >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> leader. >> >> >>> >>>> >>>>> >>>>> >>>>> So total 1 x O(N) operation. >>>>> >>>>> >>>>> >>>>> If auto.leader.rebalance.enable is enabled, the Preferred Leader >>>>> "Blacklist" can be put it before Kafka is started to prevent this broker >>>>> serving traffic. In the traditional way of running reassignments, once the >>>>> broker is up, >>>>> with auto.leader.rebalance.enable , if leadership starts going to >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> this >> >> >>> >>>> >>>>> >>>>> >>>>> new empty broker, it might have to do preferred leader election after >>>>> reassignments to remove its leaderships. e.g. (1,2,3) => (2,3,1) >>>>> reassignment only change the ordering, 1 remains as the current >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> leader, >> >> >>> >>>> >>>>> >>>>> >>>>> and needs prefer leader election to change to 2 after reassignment. >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> so >> >> >>> >>>> >>>>> >>>>> >>>>> potentially one more O(N) operation. >>>>> >>>>> >>>>> >>>>> I hope the above example can show how easy to "blacklist" a broker serving >>>>> leadership. For someone managing Production Kafka cluster, it's important >>>>> to react fast to certain alerts and mitigate/resolve some issues. As I >>>>> listed the other use cases in KIP-291, I think this feature can make the >>>>> Kafka product more easier to manage/operate. >>>>> >>>>> >>>>>> >>>>>> >>>>>> In general, using an external rebalancing tool like Cruise Control >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> is >> >> >>> >>>> >>>> >>>> a good idea to keep things balanced without having deal with manual >>>> rebalancing. > We expect more and more people who have a complex or >>>> >>>> >>> >>> >> >> >> >> large >> >> >>> >>>> >>>> >>>> cluster will start using tools like this. >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> However, if you choose to do manual rebalancing, it shouldn't be >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> that >> >> >>> >>>> >>>> >>>> bad. You would save the existing partition ordering before making your >>>> changes, then> make your changes (perhaps by running a simple command >>>> >>>> >>> >>> >> >> >> >> line >> >> >>> >>>> >>>> >>>> tool that switches the order of the replicas). Then, once you felt >>>> >>>> >>> >>> >> >> >> >> like >> >> >>> >>>> >>>> >>>> the broker was ready to> serve traffic, you could just re-apply the old >>>> ordering which you had saved. >>>> >>>> >>>>> >>>>> >>>>> We do have our own rebalancing tool which has its own criteria like Rack >>>>> diversity, disk usage, spread partitions/leaders across all brokers in the >>>>> cluster per topic, leadership Bytes/BytesIn served per broker, etc. We can >>>>> run reassignments. The point is whether it's really necessary, and if >>>>> there is more effective, easier, safer way >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> to >> >> >>> >>>> >>>>> >>>>> >>>>> do it. >>>>> >>>>> >>>>> >>>>> take another use case example of taking leadership out of busy Controller >>>>> to give it more power to serve metadata requests and other work. The >>>>> controller can failover, with the preferred leader >>>>> "blacklist", it does not have to run reassignments again when controller >>>>> failover, just change the blacklisted broker_id. >>>>> >>>>> >>>>>> >>>>>> >>>>>> I was thinking about a PlacementPolicy filling the role of >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> preventing >> >> >>> >>>> >>>> >>>> people from creating single-replica partitions on a node that we didn't >>>> want to > ever be the leader. I thought that it could also prevent >>>> >>>> >>> >>> >> >> >> >> people >> >> >>> >>>> >>>> >>>> from designating those nodes as preferred leaders during topic >>>> >>>> >>> >>> >> >> >> >> creation, or >> >> >>> >>>> >>>> >>>> Kafka from doing> itduring random topic creation. I was assuming that >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>> >>>> PlacementPolicy would determine which nodes were which through static >>>> configuration keys. I agree> static configuration keys are somewhat >>>> >>>> >>> >>> >> >> >> >> less >> >> >>> >>>> >>>> >>>> flexible than dynamic configuration. >>>> >>>> >>>>> >>>>> >>>>> I think single-replica partition might not be a good example. There should >>>>> not be any single-replica partition at all. If yes. it's probably because >>>>> of trying to save disk space with less replicas. I think at least minimum >>>>> 2. The user purposely creating single-replica partition will take full >>>>> responsibilities of data loss and unavailability when a broker fails or >>>>> under maintenance. >>>>> >>>>> >>>>> >>>>> I think it would be better to use dynamic instead of static config. >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> I >> >> >>> >>>> >>>>> >>>>> >>>>> also think it would be better to have topic creation Policy enforced >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> in >> >> >>> >>>> >>>>> >>>>> >>>>> Kafka server OR an external service. We have an external/central service >>>>> managing topic creation/partition expansion which takes into account of >>>>> rack-diversity, replication factor (2, 3 or 4 depending on cluster/topic >>>>> type), Policy replicating the topic between kafka clusters, etc. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> George >>>>> >>>>> >>>>> >>>>> On Wednesday, August 7, 2019, 05:41:28 PM PDT, Colin McCabe >>>>> < cmccabe@ apache. org ( cmcc...@apache.org ) > wrote: >>>>> >>>>> >>>>> >>>>> On Wed, Aug 7, 2019, at 12:48, George Li wrote: >>>>> >>>>> >>>>>> >>>>>> >>>>>> Hi Colin, >>>>>> >>>>>> >>>>>> >>>>>> Thanks for your feedbacks. Comments below: >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Even if you have a way of blacklisting an entire broker all at >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> once, >> >> >>> >>>> >>>> >>>> you still would need to run a leader election > for each partition >>>> >>>> >>> >>> >> >> >> >> where >> >> >>> >>>> >>>> >>>> you want to move the leader off of the blacklisted broker. So the >>>> operation is still O(N) in > that sense-- you have to do something per >>>> partition. >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> For a failed broker and swapped with an empty broker, when it comes >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> up, >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> it will not have any leadership, and we would like it to remain not >>>>>> having >>>>>> leaderships for a couple of hours or days. So there is no preferred >>>>>> leader >>>>>> election needed which incurs O(N) operation in >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> this >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> case. Putting the preferred leader blacklist would safe guard this broker >>>>>> serving traffic during that time. otherwise, if another >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> broker >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> fails(if this broker is the 1st, 2nd in the assignment), or someone runs >>>>>> preferred leader election, this new "empty" broker can still >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> get >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> leaderships. >>>>>> >>>>>> >>>>>> >>>>>> Also running reassignment to change the ordering of preferred >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> leader >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> would not actually switch the leader automatically. e.g. (1,2,3) >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> => >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> (2,3,1). unless preferred leader election is run to switch current leader >>>>>> from 1 to 2. So the operation is at least 2 x O(N). and >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> then >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> after the broker is back to normal, another 2 x O(N) to rollback. >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> Hi George, >>>>> >>>>> >>>>> >>>>> Hmm. I guess I'm still on the fence about this feature. >>>>> >>>>> >>>>> >>>>> In your example, I think we're comparing apples and oranges. You started >>>>> by outlining a scenario where "an empty broker... comes up... >>>>> [without] any leadership[s]." But then you criticize using reassignment to >>>>> switch the order of preferred replicas because it >>>>> "would not actually switch the leader automatically." If the empty broker >>>>> doesn't have any leaderships, there is nothing to be switched, right? >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> In general, reassignment will get a lot easier and quicker once >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> KIP-455 is implemented. > Reassignments that just change the order of >>>> preferred replicas for a specific partition should complete pretty much >>>> instantly. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I think it's simpler and easier just to have one source of truth >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> for what the preferred replica is for a partition, rather than two. So >>>> for> me, the fact that the replica assignment ordering isn't changed is >>>> actually a big disadvantage of this KIP. If you are a new user (or >>>> >>>> >>> >>> >> >> >> >> just> >> >> >>> >>>> >>>> >>>> an existing user that didn't read all of the documentation) and you >>>> >>>> >>> >>> >> >> >> >> just >> >> >>> >>>> >>>> >>>> look at the replica assignment, you might be confused by why> a >>>> >>>> >>> >>> >> >> >> >> particular >> >> >>> >>>> >>>> >>>> broker wasn't getting any leaderships, even though it appeared like it >>>> should. More mechanisms mean more complexity> for users and developers >>>> most of the time. >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> I would like stress the point that running reassignment to change >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> ordering of the replica (putting a broker to the end of partition >>>>>> assignment) is unnecessary, because after some time the broker is caught >>>>>> up, it can start serving traffic and then need to run reassignments again >>>>>> to "rollback" to previous states. As I >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> mentioned >> >> >>> >>>> >>>> >>>> in >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> KIP-491, this is just tedious work. >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> In general, using an external rebalancing tool like Cruise Control >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> is a >> >> >>> >>>> >>>>> >>>>> >>>>> good idea to keep things balanced without having deal with manual >>>>> rebalancing. We expect more and more people who have a complex or large >>>>> cluster will start using tools like this. >>>>> >>>>> >>>>> >>>>> However, if you choose to do manual rebalancing, it shouldn't be that bad. >>>>> You would save the existing partition ordering before making >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> your >> >> >>> >>>> >>>>> >>>>> >>>>> changes, then make your changes (perhaps by running a simple command line >>>>> tool that switches the order of the replicas). Then, once you felt like >>>>> the broker was ready to serve traffic, you could just re-apply the old >>>>> ordering which you had saved. >>>>> >>>>> >>>>>> >>>>>> >>>>>> I agree this might introduce some complexities for >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> users/developers. >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> But if this feature is good, and well documented, it is good for >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> kafka product/community. Just like KIP-460 enabling unclean leader >>>>>> election to override TopicLevel/Broker Level config of >>>>>> `unclean.leader.election.enable` >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> I agree that it would be nice if we could treat some brokers >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> differently for the purposes of placing replicas, selecting leaders, >>>> >>>> >>> >>> >> >> >> >> etc. > >> >> >>> >>>> >>>> >>>> Right now, we don't have any way of implementing that without forking >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>> >>>> broker. I would support a new PlacementPolicy class that> would close >>>> >>>> >>> >>> >> >> >> >> this >> >> >>> >>>> >>>> >>>> gap. But I don't think this KIP is flexible enough to fill this >>>> >>>> >>> >>> >> >> >> >> role. For >> >> >>> >>>> >>>> >>>> example, it can't prevent users from creating> new single-replica >>>> >>>> >>> >>> >> >> >> >> topics >> >> >>> >>>> >>>> >>>> that get put on the "bad" replica. Perhaps we should reopen the >>>> discussion> about >>>> >>>> >>> >>> >> >> >> >> https:/ / cwiki. apache. org/ confluence/ display/ KAFKA/ >> KIP-201%3A+Rationalising+Policy+interfaces >> ( >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces >> ) >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> Creating topic with single-replica is beyond what KIP-491 is >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> trying to >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> achieve. The user needs to take responsibility of doing that. I do >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> see >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> some Samza clients notoriously creating single-replica topics and >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> that >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> got flagged by alerts, because a single broker down/maintenance >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> will >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> cause offline partitions. For KIP-491 preferred leader "blacklist", the >>>>>> single-replica will still serve as leaders, because there is no other >>>>>> alternative replica to be chosen as leader. >>>>>> >>>>>> >>>>>> >>>>>> Even with a new PlacementPolicy for topic creation/partition >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> expansion, >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> it still needs the blacklist info (e.g. a zk path node, or broker >>>>>> level/topic level config) to "blacklist" the broker to be preferred >>>>>> leader? Would it be the same as KIP-491 is introducing? >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> I was thinking about a PlacementPolicy filling the role of preventing >>>>> people from creating single-replica partitions on a node that we >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> didn't >> >> >>> >>>> >>>>> >>>>> >>>>> want to ever be the leader. I thought that it could also prevent people >>>>> from designating those nodes as preferred leaders during topic creation, >>>>> or Kafka from doing itduring random topic creation. I was assuming that >>>>> the PlacementPolicy would determine which nodes were which through static >>>>> configuration keys. I agree static >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> configuration >> >> >>> >>>> >>>>> >>>>> >>>>> keys are somewhat less flexible than dynamic configuration. >>>>> >>>>> >>>>> >>>>> best, >>>>> Colin >>>>> >>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> George >>>>>> >>>>>> >>>>>> >>>>>> On Wednesday, August 7, 2019, 11:01:51 AM PDT, Colin McCabe >>>>>> < cmccabe@ apache. org ( cmcc...@apache.org ) > wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 2, 2019, at 20:02, George Li wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Colin, >>>>>>> Thanks for looking into this KIP. Sorry for the late response. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> been >> >> >>> >>>> >>>> >>>> busy. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> If a cluster has MAMY topic partitions, moving this "blacklist" >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> broker >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> to the end of replica list is still a rather "big" operation, >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> involving >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> submitting reassignments. The KIP-491 way of blacklist is much >>>>>>> simpler/easier and can undo easily without changing the replica >>>>>>> assignment >>>>>>> ordering. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Hi George, >>>>>> >>>>>> >>>>>> >>>>>> Even if you have a way of blacklisting an entire broker all at >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> once, >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> you still would need to run a leader election for each partition >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> where >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> you want to move the leader off of the blacklisted broker. So the >>>>>> operation is still O(N) in that sense-- you have to do something >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> per >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> partition. >>>>>> >>>>>> >>>>>> >>>>>> In general, reassignment will get a lot easier and quicker once >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> KIP-455 >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> is implemented. Reassignments that just change the order of >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> preferred >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> replicas for a specific partition should complete pretty much >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> instantly. >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> I think it's simpler and easier just to have one source of truth >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> for >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> what the preferred replica is for a partition, rather than two. So >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> for >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> me, the fact that the replica assignment ordering isn't changed is >>>>>> actually a big disadvantage of this KIP. If you are a new user (or just >>>>>> an >>>>>> existing user that didn't read all of the documentation) >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> and >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> you just look at the replica assignment, you might be confused by >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> why >> >> >>> >>>> >>>> >>>> a >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> particular broker wasn't getting any leaderships, even though it appeared >>>>>> like it should. More mechanisms mean more complexity for users and >>>>>> developers most of the time. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Major use case for me, a failed broker got swapped with new >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> hardware, >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> and starts up as empty (with latest offset of all partitions), >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>> >>>> SLA >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> of retention is 1 day, so before this broker is up to be in-sync >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> for >> >> >>> >>>> >>>> >>>> 1 >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> day, we would like to blacklist this broker from serving traffic. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> after >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> 1 day, the blacklist is removed and run preferred leader >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> election. >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> This way, no need to run reassignments before/after. This is the >>>>>>> "temporary" use-case. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> What if we just add an option to the reassignment tool to generate >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> a >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> plan to move all the leaders off of a specific broker? The tool >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> could >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> also run a leader election as well. That would be a simple way of doing >>>>>> this without adding new mechanisms or broker-side >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> configurations, >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> etc. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> There are use-cases that this Preferred Leader "blacklist" can be >>>>>>> somewhat >>>>>>> permanent, as I explained in the AWS data center >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> instances >> >> >>> >>>> >>>> >>>> Vs. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> on-premises data center bare metal machines (heterogenous >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> hardware), >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> that the AWS broker_ids will be blacklisted. So new topics >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> created, >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> or existing topic expansion would not make them serve traffic >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> even >> >> >>> >>>> >>>> >>>> they >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> could be the preferred leader. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I agree that it would be nice if we could treat some brokers differently >>>>>> for the purposes of placing replicas, selecting >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> leaders, >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> etc. Right now, we don't have any way of implementing that without >>>>>> forking >>>>>> the broker. I would support a new PlacementPolicy class >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> that >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> would close this gap. But I don't think this KIP is flexible >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> enough >> >> >>> >>>> >>>> >>>> to >>>> >>>> >>>>> >>>>>> >>>>>> >>>>>> fill this role. For example, it can't prevent users from creating >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> new >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> single-replica topics that get put on the "bad" replica. Perhaps >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> we >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> should reopen the discussion about >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> https:/ / cwiki. apache. org/ confluence/ display/ KAFKA/ >> KIP-201%3A+Rationalising+Policy+interfaces >> ( >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces >> ) >> >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> regards, >>>>>> Colin >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Please let me know there are more question. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> George >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe >>>>>>> < cmccabe@ apache. org ( cmcc...@apache.org ) > wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We still want to give the "blacklisted" broker the leadership if nobody >>>>>>> else is available. Therefore, isn't putting a broker on >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> blacklist pretty much the same as moving it to the last entry in >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> replicas list and then triggering a preferred leader election? >>>>>>> >>>>>>> >>>>>>> >>>>>>> If we want this to be undone after a certain amount of time, or >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> under >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> certain conditions, that seems like something that would be more >>>>>>> effectively done by an external system, rather than putting all >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> these >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> policies into Kafka. >>>>>>> >>>>>>> >>>>>>> >>>>>>> best, >>>>>>> Colin >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 19, 2019, at 18:23, George Li wrote: >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Satish, >>>>>>>> Thanks for the reviews and feedbacks. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The following is the requirements this KIP is trying to >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> accomplish: >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This can be moved to the"Proposed changes" section. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Updated the KIP-491. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The logic to determine the priority/order of which broker >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> should be >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> preferred leader should be modified. The broker in the >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> preferred leader >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> blacklist should be moved to the end (lowest priority) when >>>>>>>>> determining >>>>>>>>> leadership. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I believe there is no change required in the ordering of the >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> preferred >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> replica list. Brokers in the preferred leader blacklist are >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> skipped >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> until other brokers int he list are unavailable. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Yes. partition assignment remained the same, replica & >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> ordering. >> >> >>> >>>> >>>> >>>> The >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> blacklist logic can be optimized during implementation. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The blacklist can be at the broker level. However, there >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> might >> >> >>> >>>> >>>> >>>> be use cases >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> where a specific topic should blacklist particular brokers, >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> which >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> would be at the >>>>>>>>> Topic level Config. For this use cases of this KIP, it seems >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> that broker level >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> blacklist would suffice. Topic level preferred leader >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> blacklist >> >> >>> >>>> >>>> >>>> might >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> be future enhancement work. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I agree that the broker level preferred leader blacklist >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> would be >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> sufficient. Do you have any use cases which require topic >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> level >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> preferred blacklist? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I don't have any concrete use cases for Topic level preferred >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> leader >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> blacklist. One scenarios I can think of is when a broker has >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> high >> >> >>> >>>> >>>> >>>> CPU >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> usage, trying to identify the big topics (High MsgIn, High >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> BytesIn, >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> etc), then try to move the leaders away from this broker, >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> before >> >> >>> >>>> >>>> >>>> doing >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> an actual reassignment to change its preferred leader, try to >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> put >> >> >>> >>>> >>>> >>>> this >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> preferred_leader_blacklist in the Topic Level config, and run >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> preferred >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> leader election, and see whether CPU decreases for this broker, >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> if >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> yes, then do the reassignments to change the preferred leaders >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> to >> >> >>> >>>> >>>> >>>> be >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "permanent" (the topic may have many partitions like 256 that >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> has >> >> >>> >>>> >>>> >>>> quite >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> a few of them having this broker as preferred leader). So this >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> Topic >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Level config is an easy way of doing trial and check the >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> result. >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> You can add the below workaround as an item in the rejected >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> alternatives section >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> "Reassigning all the topic/partitions which the intended >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> broker >> >> >>> >>>> >>>> >>>> is a >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> replica for." >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Updated the KIP-491. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> George >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Friday, July 19, 2019, 08:20:22 AM PDT, Satish Duggana >>>>>>>> < satish. duggana@ gmail. com ( satish.dugg...@gmail.com ) > wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks for the KIP. I have put my comments below. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This is a nice improvement to avoid cumbersome maintenance. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The following is the requirements this KIP is trying to >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> accomplish: >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The ability to add and remove the preferred leader >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> deprioritized >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> list/blacklist. e.g. new ZK path/node or new dynamic config. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This can be moved to the"Proposed changes" section. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The logic to determine the priority/order of which broker >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> should >> >> >>> >>>> >>>> >>>> be >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> preferred leader should be modified. The broker in the >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> preferred >> >> >>> >>>> >>>> >>>> leader >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> blacklist should be moved to the end (lowest priority) when determining >>>>>>>> leadership. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I believe there is no change required in the ordering of the >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> preferred >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> replica list. Brokers in the preferred leader blacklist are >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> skipped >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> until other brokers int he list are unavailable. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The blacklist can be at the broker level. However, there >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> might >> >> >>> >>>> >>>> >>>> be use cases >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> where a specific topic should blacklist particular brokers, >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> which >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> would be at the >>>>>>>> Topic level Config. For this use cases of this KIP, it seems >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> that >> >> >>> >>>> >>>> >>>> broker level >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> blacklist would suffice. Topic level preferred leader >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> blacklist >> >> >>> >>>> >>>> >>>> might >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> be future enhancement work. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I agree that the broker level preferred leader blacklist would >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> be >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> sufficient. Do you have any use cases which require topic level >>>>>>>> preferred >>>>>>>> blacklist? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> You can add the below workaround as an item in the rejected >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> alternatives section >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "Reassigning all the topic/partitions which the intended >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> broker is >> >> >>> >>>> >>>> >>>> a >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> replica for." >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Satish. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski >>>>>>>> < stanislav@ confluent. io ( stanis...@confluent.io ) > wrote: >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hey George, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for the KIP, it's an interesting idea. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I was wondering whether we could achieve the same thing via >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> kafka-reassign-partitions tool. As you had also said in the >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> JIRA, it is >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> true that this is currently very tedious with the tool. My >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> thoughts are >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> that we could improve the tool and give it the notion of a >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> "blacklisted >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> preferred leader". >>>>>>>>> This would have some benefits like: >>>>>>>>> - more fine-grained control over the blacklist. we may not >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> want >> >> >>> >>>> >>>> >>>> to >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> blacklist all the preferred leaders, as that would make the >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> blacklisted >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> broker a follower of last resort which is not very useful. In >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> the cases of >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> an underpowered AWS machine or a controller, you might >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> overshoot >> >> >>> >>>> >>>> >>>> and make >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> the broker very underutilized if you completely make it >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> leaderless. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> - is not permanent. If we are to have a blacklist leaders >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> config, >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> rebalancing tools would also need to know about it and >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> manipulate/respect >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> it to achieve a fair balance. >>>>>>>>> It seems like both problems are tied to balancing partitions, >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> it's just >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> that KIP-491's use case wants to balance them against other >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> factors in a >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> more nuanced way. It makes sense to have both be done from >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>> >>>> same place >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> To make note of the motivation section: >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Avoid bouncing broker in order to lose its leadership >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The recommended way to make a broker lose its leadership is >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> to >> >> >>> >>>> >>>> >>>> run a >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> reassignment on its partitions >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The cross-data center cluster has AWS cloud instances which >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> have less >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> computing power >>>>>>>>> We recommend running Kafka on homogeneous machines. It would >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> be >> >> >>> >>>> >>>> >>>> cool if the >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> system supported more flexibility in that regard but that is >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> more nuanced >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> and a preferred leader blacklist may not be the best first >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> approach to the >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> issue >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Adding a new config which can fundamentally change the way >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> replication is >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> done is complex, both for the system (the replication code is >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> complex >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> enough) and the user. Users would have another potential >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> config >> >> >>> >>>> >>>> >>>> that could >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> backfire on them - e.g if left forgotten. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Could you think of any downsides to implementing this >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> functionality (or a >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> variation of it) in the kafka-reassign-partitions. sh ( >>>>>>>>> http://kafka-reassign-partitions.sh/ ) tool? One downside I can see >>>>>>>>> is that >>>>>>>>> we would not have it handle >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> new >> >> >>> >>>> >>>> >>>> partitions >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> created after the "blacklist operation". As a first >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> iteration I >> >> >>> >>>> >>>> >>>> think that >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> may be acceptable >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Stanislav >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 19, 2019 at 3:20 AM George Li < >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> sql_consulting@ yahoo. com. invalid ( sql_consult...@yahoo.com.invalid ) > >>>> >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Pinging the list for the feedbacks of this KIP-491 ( >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> https:/ / cwiki. apache. org/ confluence/ pages/ viewpage. >> action?pageId=120736982 >> ( >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982 >> ) >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> George >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li < >>>>>>>>>> sql_consulting@ yahoo. >>>>>>>>>> com. INVALID ( sql_consult...@yahoo.com.INVALID ) > wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I have created KIP-491 ( >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> https:/ / cwiki. apache. org/ confluence/ pages/ viewpage. >> action?pageId=120736982 >> ( >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982 >> ) >> >> >>> >>>> >>>> >>>> ) >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> for putting a broker to the preferred leader blacklist or >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> deprioritized >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> list so when determining leadership, it's moved to the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> lowest >> >> >>> >>>> >>>> >>>> priority for >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> some of the listed use-cases. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Please provide your comments/feedbacks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> George >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ----- Forwarded Message ----- From: Jose Armando Garcia >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> Sancio (JIRA) < >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> jira@ apache. org ( j...@apache.org ) >To: " sql_consulting@ yahoo. >>>>>>>>>> com ( >>>>>>>>>> sql_consult...@yahoo.com ) " < >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> sql_consulting@ yahoo. com ( sql_consult...@yahoo.com ) >Sent: >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> [Commented] >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (KAFKA-8638) Preferred Leader Blacklist (deprioritized >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> list) >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> https:/ / issues. apache. org/ jira/ browse/ KAFKA-8638?page=com. atlassian. >> jira. plugin. system. >> issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511 >> ( >> https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511 >> ) >> >> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Jose Armando Garcia Sancio commented on KAFKA-8638: >>>>>>>>>> --------------------------------------------------- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for feedback and clear use cases [~sql_consulting]. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Preferred Leader Blacklist (deprioritized list) >>>>>>>>>>> ----------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Key: KAFKA-8638 >>>>>>>>>>> URL: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> https:/ / issues. apache. org/ jira/ browse/ KAFKA-8638 ( >>>> https://issues.apache.org/jira/browse/KAFKA-8638 ) >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Project: Kafka >>>>>>>>>>> Issue Type: Improvement >>>>>>>>>>> Components: config, controller, core >>>>>>>>>>> Affects Versions: 1.1.1, 2.3.0, 2.2.1 >>>>>>>>>>> Reporter: GEORGE LI >>>>>>>>>>> Assignee: GEORGE LI >>>>>>>>>>> Priority: Major >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Currently, the kafka preferred leader election will pick >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>> >>>> broker_id >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> in the topic/partition replica assignments in a priority >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> order >> >> >>> >>>> >>>> >>>> when the >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> broker is in ISR. The preferred leader is the broker id in >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> the >> >> >>> >>>> >>>> >>>> first >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> position of replica. There are use-cases that, even the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> first >> >> >>> >>>> >>>> >>>> broker in the >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> replica assignment is in ISR, there is a need for it to be >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> moved to the end >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> of ordering (lowest priority) when deciding leadership >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> during >> >> >>> >>>> >>>> >>>> preferred >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> leader election. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Let’s use topic/partition replica (1,2,3) as an example. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> 1 >> >> >>> >>>> >>>> >>>> is the >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> preferred leader. When preferred leadership is run, it >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> will >> >> >>> >>>> >>>> >>>> pick 1 as the >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> leader if it's ISR, if 1 is not online and in ISR, then >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> pick >> >> >>> >>>> >>>> >>>> 2, if 2 is not >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> in ISR, then pick 3 as the leader. There are use cases >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> that, >> >> >>> >>>> >>>> >>>> even 1 is in >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ISR, we would like it to be moved to the end of ordering >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> (lowest priority) >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> when deciding leadership during preferred leader election. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> Below is a list >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> of use cases: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * (If broker_id 1 is a swapped failed host and brought up >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> with last >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> segments or latest offset without historical data (There is >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> another effort >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> on this), it's better for it to not serve leadership till >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> it's >> >> >>> >>>> >>>> >>>> caught-up. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * The cross-data center cluster has AWS instances which >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> have >> >> >>> >>>> >>>> >>>> less >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> computing power than the on-prem bare metal machines. We >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> could put the AWS >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> broker_ids in Preferred Leader Blacklist, so on-prem >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> brokers >> >> >>> >>>> >>>> >>>> can be elected >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> leaders, without changing the reassignments ordering of the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> replicas. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * If the broker_id 1 is constantly losing leadership >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> after >> >> >>> >>>> >>>> >>>> some time: >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "Flapping". we would want to exclude 1 to be a leader >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> unless >> >> >>> >>>> >>>> >>>> all other >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> brokers of this topic/partition are offline. The >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> “Flapping” >> >> >>> >>>> >>>> >>>> effect was >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> seen in the past when 2 or more brokers were bad, when they >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> lost leadership >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> constantly/quickly, the sets of partition replicas they >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> belong >> >> >>> >>>> >>>> >>>> to will see >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> leadership constantly changing. The ultimate solution is >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> to >> >> >>> >>>> >>>> >>>> swap these bad >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> hosts. But for quick mitigation, we can also put the bad >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> hosts in the >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Preferred Leader Blacklist to move the priority of its >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> being >> >> >>> >>>> >>>> >>>> elected as >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> leaders to the lowest. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * If the controller is busy serving an extra load of >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> metadata requests >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> and other tasks. we would like to put the controller's >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> leaders >> >> >>> >>>> >>>> >>>> to other >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> brokers to lower its CPU load. currently bouncing to lose >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> leadership would >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> not work for Controller, because after the bounce, the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> controller fails >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> over to another broker. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * Avoid bouncing broker in order to lose its leadership: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> it >> >> >>> >>>> >>>> >>>> would be >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> good if we have a way to specify which broker should be >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> excluded from >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> serving traffic/leadership (without changing the replica >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> assignment >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ordering by reassignments, even though that's quick), and >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> run >> >> >>> >>>> >>>> >>>> preferred >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> leader election. A bouncing broker will cause temporary >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> URP, >> >> >>> >>>> >>>> >>>> and sometimes >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> other issues. Also a bouncing of broker (e.g. broker_id 1) >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> can temporarily >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> lose all its leadership, but if another broker (e.g. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> broker_id >> >> >>> >>>> >>>> >>>> 2) fails or >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> gets bounced, some of its leaderships will likely failover >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> to >> >> >>> >>>> >>>> >>>> broker_id 1 >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> on a replica with 3 brokers. If broker_id 1 is in the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> blacklist, then in >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> such a scenario even broker_id 2 offline, the 3rd broker >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> can >> >> >>> >>>> >>>> >>>> take >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> leadership. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The current work-around of the above is to change the >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> topic/partition's >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> replica reassignments to move the broker_id 1 from the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> first >> >> >>> >>>> >>>> >>>> position to >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> the last position and run preferred leader election. e.g. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> (1, >> >> >>> >>>> >>>> >>>> 2, 3) => (2, >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 3, 1). This changes the replica reassignments, and we need >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> to >> >> >>> >>>> >>>> >>>> keep track of >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> the original one and restore if things change (e.g. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> controller >> >> >>> >>>> >>>> >>>> fails over >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> to another broker, the swapped empty broker caught up). >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> That’s >> >> >>> >>>> >>>> >>>> a rather >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> tedious task. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> This message was sent by Atlassian JIRA >>>>>>>>>> (v7.6.3#76005) >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > > > -- > Best, > Stanislav > > >