Thanks for the KIP. I have put my comments below. This is a nice improvement to avoid cumbersome maintenance.
>> The following is the requirements this KIP is trying to accomplish: The ability to add and remove the preferred leader deprioritized list/blacklist. e.g. new ZK path/node or new dynamic config. This can be moved to the"Proposed changes" section. >>The logic to determine the priority/order of which broker should be preferred leader should be modified. The broker in the preferred leader blacklist should be moved to the end (lowest priority) when determining leadership. I believe there is no change required in the ordering of the preferred replica list. Brokers in the preferred leader blacklist are skipped until other brokers int he list are unavailable. >>The blacklist can be at the broker level. However, there might be use cases where a specific topic should blacklist particular brokers, which would be at the Topic level Config. For this use cases of this KIP, it seems that broker level blacklist would suffice. Topic level preferred leader blacklist might be future enhancement work. I agree that the broker level preferred leader blacklist would be sufficient. Do you have any use cases which require topic level preferred blacklist? You can add the below workaround as an item in the rejected alternatives section "Reassigning all the topic/partitions which the intended broker is a replica for." Thanks, Satish. On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski <stanis...@confluent.io> wrote: > > Hey George, > > Thanks for the KIP, it's an interesting idea. > > I was wondering whether we could achieve the same thing via the > kafka-reassign-partitions tool. As you had also said in the JIRA, it is > true that this is currently very tedious with the tool. My thoughts are > that we could improve the tool and give it the notion of a "blacklisted > preferred leader". > This would have some benefits like: > - more fine-grained control over the blacklist. we may not want to > blacklist all the preferred leaders, as that would make the blacklisted > broker a follower of last resort which is not very useful. In the cases of > an underpowered AWS machine or a controller, you might overshoot and make > the broker very underutilized if you completely make it leaderless. > - is not permanent. If we are to have a blacklist leaders config, > rebalancing tools would also need to know about it and manipulate/respect > it to achieve a fair balance. > It seems like both problems are tied to balancing partitions, it's just > that KIP-491's use case wants to balance them against other factors in a > more nuanced way. It makes sense to have both be done from the same place > > To make note of the motivation section: > > Avoid bouncing broker in order to lose its leadership > The recommended way to make a broker lose its leadership is to run a > reassignment on its partitions > > The cross-data center cluster has AWS cloud instances which have less > computing power > We recommend running Kafka on homogeneous machines. It would be cool if the > system supported more flexibility in that regard but that is more nuanced > and a preferred leader blacklist may not be the best first approach to the > issue > > Adding a new config which can fundamentally change the way replication is > done is complex, both for the system (the replication code is complex > enough) and the user. Users would have another potential config that could > backfire on them - e.g if left forgotten. > > Could you think of any downsides to implementing this functionality (or a > variation of it) in the kafka-reassign-partitions.sh tool? > One downside I can see is that we would not have it handle new partitions > created after the "blacklist operation". As a first iteration I think that > may be acceptable > > Thanks, > Stanislav > > On Fri, Jul 19, 2019 at 3:20 AM George Li <sql_consult...@yahoo.com.invalid> > wrote: > > > Hi, > > > > Pinging the list for the feedbacks of this KIP-491 ( > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982 > > ) > > > > > > Thanks, > > George > > > > On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li < > > sql_consult...@yahoo.com.INVALID> wrote: > > > > Hi, > > > > I have created KIP-491 ( > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982) > > for putting a broker to the preferred leader blacklist or deprioritized > > list so when determining leadership, it's moved to the lowest priority for > > some of the listed use-cases. > > > > Please provide your comments/feedbacks. > > > > Thanks, > > George > > > > > > > > ----- Forwarded Message ----- From: Jose Armando Garcia Sancio (JIRA) < > > j...@apache.org>To: "sql_consult...@yahoo.com" > > <sql_consult...@yahoo.com>Sent: > > Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented] > > (KAFKA-8638) Preferred Leader Blacklist (deprioritized list) > > > > [ > > https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511 > > ] > > > > Jose Armando Garcia Sancio commented on KAFKA-8638: > > --------------------------------------------------- > > > > Thanks for feedback and clear use cases [~sql_consulting]. > > > > > Preferred Leader Blacklist (deprioritized list) > > > ----------------------------------------------- > > > > > > Key: KAFKA-8638 > > > URL: https://issues.apache.org/jira/browse/KAFKA-8638 > > > Project: Kafka > > > Issue Type: Improvement > > > Components: config, controller, core > > > Affects Versions: 1.1.1, 2.3.0, 2.2.1 > > > Reporter: GEORGE LI > > > Assignee: GEORGE LI > > > Priority: Major > > > > > > Currently, the kafka preferred leader election will pick the broker_id > > in the topic/partition replica assignments in a priority order when the > > broker is in ISR. The preferred leader is the broker id in the first > > position of replica. There are use-cases that, even the first broker in the > > replica assignment is in ISR, there is a need for it to be moved to the end > > of ordering (lowest priority) when deciding leadership during preferred > > leader election. > > > Let’s use topic/partition replica (1,2,3) as an example. 1 is the > > preferred leader. When preferred leadership is run, it will pick 1 as the > > leader if it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not > > in ISR, then pick 3 as the leader. There are use cases that, even 1 is in > > ISR, we would like it to be moved to the end of ordering (lowest priority) > > when deciding leadership during preferred leader election. Below is a list > > of use cases: > > > * (If broker_id 1 is a swapped failed host and brought up with last > > segments or latest offset without historical data (There is another effort > > on this), it's better for it to not serve leadership till it's caught-up. > > > * The cross-data center cluster has AWS instances which have less > > computing power than the on-prem bare metal machines. We could put the AWS > > broker_ids in Preferred Leader Blacklist, so on-prem brokers can be elected > > leaders, without changing the reassignments ordering of the replicas. > > > * If the broker_id 1 is constantly losing leadership after some time: > > "Flapping". we would want to exclude 1 to be a leader unless all other > > brokers of this topic/partition are offline. The “Flapping” effect was > > seen in the past when 2 or more brokers were bad, when they lost leadership > > constantly/quickly, the sets of partition replicas they belong to will see > > leadership constantly changing. The ultimate solution is to swap these bad > > hosts. But for quick mitigation, we can also put the bad hosts in the > > Preferred Leader Blacklist to move the priority of its being elected as > > leaders to the lowest. > > > * If the controller is busy serving an extra load of metadata requests > > and other tasks. we would like to put the controller's leaders to other > > brokers to lower its CPU load. currently bouncing to lose leadership would > > not work for Controller, because after the bounce, the controller fails > > over to another broker. > > > * Avoid bouncing broker in order to lose its leadership: it would be > > good if we have a way to specify which broker should be excluded from > > serving traffic/leadership (without changing the replica assignment > > ordering by reassignments, even though that's quick), and run preferred > > leader election. A bouncing broker will cause temporary URP, and sometimes > > other issues. Also a bouncing of broker (e.g. broker_id 1) can temporarily > > lose all its leadership, but if another broker (e.g. broker_id 2) fails or > > gets bounced, some of its leaderships will likely failover to broker_id 1 > > on a replica with 3 brokers. If broker_id 1 is in the blacklist, then in > > such a scenario even broker_id 2 offline, the 3rd broker can take > > leadership. > > > The current work-around of the above is to change the topic/partition's > > replica reassignments to move the broker_id 1 from the first position to > > the last position and run preferred leader election. e.g. (1, 2, 3) => (2, > > 3, 1). This changes the replica reassignments, and we need to keep track of > > the original one and restore if things change (e.g. controller fails over > > to another broker, the swapped empty broker caught up). That’s a rather > > tedious task. > > > > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v7.6.3#76005)