Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

Harsha Ch Sat, 14 Sep 2019 09:58:11 -0700

Hi Stanislav,

               Thanks for the comments. The proposal we are making is not about 
optimizing Big-O but instead provide a simpler way of stopping a broker 
becoming leader.  If we want to go with making this an option and providing a 
tool which abstracts moving the broker to end preferred leader list , it needs 
to do it for all the partitions that broker is leader for. As said in the above 
comment a broker i.e leader for 1000 partitions we have to this for all the 
partitions.  Instead of having a blacklist will help simplify this process and 
we can provide monitoring/alerts on such list.


"This sounds like a bit of a hack. If that is the concern, why not propose a 
KIP that addresses the specific issue?"

Do you mind shedding some light what issue you are talking to propose a KIP for?

Replication is a challenge when we are bringing up a new node.  If you have 
retention period of 3 days there is honestly no way to do it via online 
replication without taking a hit on latency SLAs. 

Is your ask to find a way to fix the replication itself when we are bringing a 
new broker from  no data.

"Having a blacklist you control still seems like a workaround given that Kafka 
itself knows when the topic retention would allow you to switch that replica to 
a leader"

Not sure how its making it any complicated by having a single zk path to have a 
list of brokers.

Thanks,

Harsha

On Mon, Sep 09, 2019 at 3:55 PM, Stanislav Kozlovski < stanis...@confluent.io > 
wrote:

> 
> 
> 
> I agree with Colin that the same result should be achievable through
> proper abstraction in a tool. Even if that might be "4xO(N)" operations,
> that is still not a lot - it is still classified as O(N)
> 
> 
> 
> Let's say a healthy broker hosting 3000 partitions, and of which 1000 are
> 
> 
>> 
>> 
>> the preferred leaders (leader count is 1000). There is a hardware failure
>> (disk/memory, etc.), and kafka process crashed. We swap this host with
>> another host but keep the same broker. id ( http://broker.id/ ) , when this
>> new broker coming up, it has no historical data, and we manage to have the
>> current last offsets of all partitions set in the
>> replication-offset-checkpoint (if we don't set them, it could cause crazy
>> ReplicaFetcher pulling of historical data from other brokers and cause
>> cluster high latency and other instabilities), so when Kafka is brought
>> up, it is quickly catching up as followers in the ISR. Note, we have
>> auto.leader.rebalance.enable disabled, so it's not serving any traffic as
>> leaders (leader count = 0), even there are 1000 partitions that this
>> broker is the Preferred Leader. We need to make this broker not serving
>> traffic for a few hours or days depending on the SLA of the topic
>> retention requirement until after it's having enough historical data.
>> 
>> 
> 
> 
> 
> This sounds like a bit of a hack. If that is the concern, why not propose
> a KIP that addresses the specific issue? Having a blacklist you control
> still seems like a workaround given that Kafka itself knows when the topic
> retention would allow you to switch that replica to a leader
> 
> 
> 
> I really hope we can come up with a solution that avoids complicating the
> controller and state machine logic further.
> Could you please list out the main drawbacks of abstract this away in the
> reassignments tool (or a new tool)?
> 
> 
> 
> On Mon, Sep 9, 2019 at 7:53 AM Colin McCabe < cmccabe@ apache. org (
> cmcc...@apache.org ) > wrote:
> 
> 
>> 
>> 
>> On Sat, Sep 7, 2019, at 09:21, Harsha Chintalapani wrote:
>> 
>> 
>>> 
>>> 
>>> Hi Colin,
>>> Can you give us more details on why you don't want this to be part of the
>>> Kafka core. You are proposing KIP-500 which will take away zookeeper and
>>> writing this interim tools to change the zookeeper metadata doesn't make
>>> sense to me.
>>> 
>>> 
>> 
>> 
>> 
>> Hi Harsha,
>> 
>> 
>> 
>> The reassignment API described in KIP-455, which will be part of Kafka
>> 2.4, doesn't rely on ZooKeeper. This API will stay the same after KIP-500
>> is implemented.
>> 
>> 
>>> 
>>> 
>>> As George pointed out there are
>>> several benefits having it in the system itself instead of asking users to
>>> hack bunch of json files to deal with outage scenario.
>>> 
>>> 
>> 
>> 
>> 
>> In both cases, the user just has to run a shell command, right? In both
>> cases, the user has to remember to undo the command later when they want
>> the broker to be treated normally again. And in both cases, the user
>> should probably be running an external rebalancing tool to avoid having to
>> run these commands manually. :)
>> 
>> 
>> 
>> best,
>> Colin
>> 
>> 
>>> 
>>> 
>>> Thanks,
>>> Harsha
>>> 
>>> 
>>> 
>>> On Fri, Sep 6, 2019 at 4:36 PM George Li < sql_consulting@ yahoo. com (
>>> sql_consult...@yahoo.com )
>>> 
>>> 
>> 
>> 
>> 
>> .invalid>
>> 
>> 
>>> 
>>> 
>>> wrote:
>>> 
>>> 
>>>> 
>>>> 
>>>> Hi Colin,
>>>> 
>>>> 
>>>> 
>>>> Thanks for the feedback. The "separate set of metadata about
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> blacklists"
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> in KIP-491 is just the list of broker ids. Usually 1 or 2 or a couple
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> in
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> the cluster. Should be easier than keeping json files? e.g. what if
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> we
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> first blacklist broker_id_1, then another broker_id_2 has issues, and
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> we
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> need to write out another json file to restore later (and in which
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> order)?
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> Using blacklist, we can just add the broker_id_2 to the existing one.
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> and
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> remove whatever broker_id returning to good state without worrying
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> how(the
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> ordering of putting the broker to blacklist) to restore.
>>>> 
>>>> 
>>>> 
>>>> For topic level config, the blacklist will be tied to topic/partition(e.g.
>>>> Configs:
>>>> topic.preferred.leader.blacklist=0:101,102;1:103 where 0 & 1 is the
>>>> partition#, 101,102,103 are the blacklist broker_ids), and easier to
>>>> update/remove, no need for external json files?
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> George
>>>> 
>>>> 
>>>> 
>>>> On Friday, September 6, 2019, 02:20:33 PM PDT, Colin McCabe < cmccabe@ 
>>>> apache.
>>>> org ( cmcc...@apache.org ) > wrote:
>>>> 
>>>> 
>>>> 
>>>> One possibility would be writing a new command-line tool that would
>>>> deprioritize a given replica using the new KIP-455 API. Then it could
>>>> write out a JSON files containing the old priorities, which could be
>>>> restored when (or if) we needed to do so. This seems like it might be
>>>> simpler and easier to maintain than a separate set of metadata about
>>>> blacklists.
>>>> 
>>>> 
>>>> 
>>>> best,
>>>> Colin
>>>> 
>>>> 
>>>> 
>>>> On Fri, Sep 6, 2019, at 11:58, George Li wrote:
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> 
>>>>> Just want to ping and bubble up the discussion of KIP-491.
>>>>> 
>>>>> 
>>>>> 
>>>>> On a large scale of Kafka clusters with thousands of brokers in many
>>>>> clusters. Frequent hardware failures are common, although the
>>>>> reassignments to change the preferred leaders is a workaround, it incurs
>>>>> unnecessary additional work than the proposed preferred leader blacklist
>>>>> in KIP-491, and hard to scale.
>>>>> 
>>>>> 
>>>>> 
>>>>> I am wondering whether others using Kafka in a big scale running into same
>>>>> problem.
>>>>> 
>>>>> 
>>>>> 
>>>>> Satish,
>>>>> 
>>>>> 
>>>>> 
>>>>> Regarding your previous question about whether there is use-case for
>>>>> TopicLevel preferred leader "blacklist", I thought about one use-case: to
>>>>> improve rebalance/reassignment, the large partition
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> will
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> usually cause performance/stability issues, planning to change the
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> say
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> the New Replica will start with Leader's latest offset(this way the
>>>>> replica is almost instantly in the ISR and reassignment completed),
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> and
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> put this partition's NewReplica into Preferred Leader "Blacklist" at the
>>>>> Topic Level config for that partition. After sometime(retention time),
>>>>> this new replica has caught up and ready to serve traffic, update/remove
>>>>> the TopicConfig for this partition's preferred leader blacklist.
>>>>> 
>>>>> 
>>>>> 
>>>>> I will update the KIP-491 later for this use case of Topic Level
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> config
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> for Preferred Leader Blacklist.
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> George
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li
>>>>> < sql_consulting@ yahoo. com ( sql_consult...@yahoo.com ) > wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi Colin,
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> In your example, I think we're comparing apples and oranges. You
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> started by outlining a scenario where "an empty broker... comes up...
>>>> [without] any > leadership[s]." But then you criticize using
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> reassignment
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> to switch the order of preferred replicas because it "would not
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> actually
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> switch the leader > automatically." If the empty broker doesn't have
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> any
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> leaderships, there is nothing to be switched, right?
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Let me explained in details of this particular use case example for
>>>>> comparing apples to apples.
>>>>> 
>>>>> 
>>>>> 
>>>>> Let's say a healthy broker hosting 3000 partitions, and of which 1000 are
>>>>> the preferred leaders (leader count is 1000). There is a hardware failure
>>>>> (disk/memory, etc.), and kafka process crashed. We swap this host with
>>>>> another host but keep the same broker. id ( http://broker.id/ ) , when 
>>>>> this
>>>>> new broker coming up, it has no historical data, and we manage to have
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> current last offsets of all partitions set in
>>>>> the replication-offset-checkpoint (if we don't set them, it could
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> cause
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> crazy ReplicaFetcher pulling of historical data from other brokers
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> and
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> cause cluster high latency and other instabilities), so when Kafka is
>>>>> brought up, it is quickly catching up as followers in the ISR. Note, we
>>>>> have auto.leader.rebalance.enable disabled, so it's not serving
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> any
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> traffic as leaders (leader count = 0), even there are 1000 partitions that
>>>>> this broker is the Preferred Leader.
>>>>> 
>>>>> 
>>>>> 
>>>>> We need to make this broker not serving traffic for a few hours or
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> days
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> depending on the SLA of the topic retention requirement until after it's
>>>>> having enough historical data.
>>>>> 
>>>>> 
>>>>> 
>>>>> * The traditional way using the reassignments to move this broker in that
>>>>> 1000 partitions where it's the preferred leader to the end of assignment,
>>>>> this is O(N) operation. and from my experience, we can't submit all 1000
>>>>> at the same time, otherwise cause higher latencies
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> even
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> the reassignment in this case can complete almost instantly. After
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> a
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> few hours/days whatever, this broker is ready to serve traffic, we have to
>>>>> run reassignments again to restore that 1000 partitions preferred leaders
>>>>> for this broker: O(N) operation. then run
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> preferred
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> leader election O(N) again. So total 3 x O(N) operations. The point is
>>>>> since the new empty broker is expected to be the same as the old
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> one
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> in terms of hosting partition/leaders, it would seem unnecessary to
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> do
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> reassignments (ordering of replica) during the broker catching up
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> time.
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> * The new feature Preferred Leader "Blacklist": just need to put a dynamic
>>>>> config to indicate that this broker should be considered
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> leader
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> (preferred leader election or broker failover or unclean leader election)
>>>>> to the lowest priority. NO need to run any reassignments. After a few
>>>>> hours/days, when this broker is ready, remove the dynamic config, and run
>>>>> preferred leader election and this broker will serve traffic for that 1000
>>>>> original partitions it was the preferred
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> leader.
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> So total 1 x O(N) operation.
>>>>> 
>>>>> 
>>>>> 
>>>>> If auto.leader.rebalance.enable is enabled, the Preferred Leader
>>>>> "Blacklist" can be put it before Kafka is started to prevent this broker
>>>>> serving traffic. In the traditional way of running reassignments, once the
>>>>> broker is up,
>>>>> with auto.leader.rebalance.enable , if leadership starts going to
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> this
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> new empty broker, it might have to do preferred leader election after
>>>>> reassignments to remove its leaderships. e.g. (1,2,3) => (2,3,1)
>>>>> reassignment only change the ordering, 1 remains as the current
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> leader,
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> and needs prefer leader election to change to 2 after reassignment.
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> so
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> potentially one more O(N) operation.
>>>>> 
>>>>> 
>>>>> 
>>>>> I hope the above example can show how easy to "blacklist" a broker serving
>>>>> leadership. For someone managing Production Kafka cluster, it's important
>>>>> to react fast to certain alerts and mitigate/resolve some issues. As I
>>>>> listed the other use cases in KIP-291, I think this feature can make the
>>>>> Kafka product more easier to manage/operate.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> In general, using an external rebalancing tool like Cruise Control
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> is
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> a good idea to keep things balanced without having deal with manual
>>>> rebalancing. > We expect more and more people who have a complex or
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> large
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> cluster will start using tools like this.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> However, if you choose to do manual rebalancing, it shouldn't be
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> that
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> bad. You would save the existing partition ordering before making your
>>>> changes, then> make your changes (perhaps by running a simple command
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> line
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> tool that switches the order of the replicas). Then, once you felt
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> like
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> the broker was ready to> serve traffic, you could just re-apply the old
>>>> ordering which you had saved.
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> We do have our own rebalancing tool which has its own criteria like Rack
>>>>> diversity, disk usage, spread partitions/leaders across all brokers in the
>>>>> cluster per topic, leadership Bytes/BytesIn served per broker, etc. We can
>>>>> run reassignments. The point is whether it's really necessary, and if
>>>>> there is more effective, easier, safer way
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> to
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> do it.
>>>>> 
>>>>> 
>>>>> 
>>>>> take another use case example of taking leadership out of busy Controller
>>>>> to give it more power to serve metadata requests and other work. The
>>>>> controller can failover, with the preferred leader
>>>>> "blacklist", it does not have to run reassignments again when controller
>>>>> failover, just change the blacklisted broker_id.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I was thinking about a PlacementPolicy filling the role of
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> preventing
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> people from creating single-replica partitions on a node that we didn't
>>>> want to > ever be the leader. I thought that it could also prevent
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> people
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> from designating those nodes as preferred leaders during topic
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> creation, or
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> Kafka from doing> itduring random topic creation. I was assuming that
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> PlacementPolicy would determine which nodes were which through static
>>>> configuration keys. I agree> static configuration keys are somewhat
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> less
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> flexible than dynamic configuration.
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> I think single-replica partition might not be a good example. There should
>>>>> not be any single-replica partition at all. If yes. it's probably because
>>>>> of trying to save disk space with less replicas. I think at least minimum
>>>>> 2. The user purposely creating single-replica partition will take full
>>>>> responsibilities of data loss and unavailability when a broker fails or
>>>>> under maintenance.
>>>>> 
>>>>> 
>>>>> 
>>>>> I think it would be better to use dynamic instead of static config.
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> I
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> also think it would be better to have topic creation Policy enforced
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> in
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Kafka server OR an external service. We have an external/central service
>>>>> managing topic creation/partition expansion which takes into account of
>>>>> rack-diversity, replication factor (2, 3 or 4 depending on cluster/topic
>>>>> type), Policy replicating the topic between kafka clusters, etc.
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> George
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wednesday, August 7, 2019, 05:41:28 PM PDT, Colin McCabe
>>>>> < cmccabe@ apache. org ( cmcc...@apache.org ) > wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Aug 7, 2019, at 12:48, George Li wrote:
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hi Colin,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks for your feedbacks. Comments below:
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Even if you have a way of blacklisting an entire broker all at
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> once,
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> you still would need to run a leader election > for each partition
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> where
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> you want to move the leader off of the blacklisted broker. So the
>>>> operation is still O(N) in > that sense-- you have to do something per
>>>> partition.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> For a failed broker and swapped with an empty broker, when it comes
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> up,
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> it will not have any leadership, and we would like it to remain not 
>>>>>> having
>>>>>> leaderships for a couple of hours or days. So there is no preferred 
>>>>>> leader
>>>>>> election needed which incurs O(N) operation in
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> this
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> case. Putting the preferred leader blacklist would safe guard this broker
>>>>>> serving traffic during that time. otherwise, if another
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> broker
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> fails(if this broker is the 1st, 2nd in the assignment), or someone runs
>>>>>> preferred leader election, this new "empty" broker can still
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> get
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> leaderships.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Also running reassignment to change the ordering of preferred
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> leader
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> would not actually switch the leader automatically. e.g. (1,2,3)
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> =>
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> (2,3,1). unless preferred leader election is run to switch current leader
>>>>>> from 1 to 2. So the operation is at least 2 x O(N). and
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> then
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> after the broker is back to normal, another 2 x O(N) to rollback.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi George,
>>>>> 
>>>>> 
>>>>> 
>>>>> Hmm. I guess I'm still on the fence about this feature.
>>>>> 
>>>>> 
>>>>> 
>>>>> In your example, I think we're comparing apples and oranges. You started
>>>>> by outlining a scenario where "an empty broker... comes up...
>>>>> [without] any leadership[s]." But then you criticize using reassignment to
>>>>> switch the order of preferred replicas because it
>>>>> "would not actually switch the leader automatically." If the empty broker
>>>>> doesn't have any leaderships, there is nothing to be switched, right?
>>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> In general, reassignment will get a lot easier and quicker once
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> KIP-455 is implemented. > Reassignments that just change the order of
>>>> preferred replicas for a specific partition should complete pretty much
>>>> instantly.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I think it's simpler and easier just to have one source of truth
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> for what the preferred replica is for a partition, rather than two. So
>>>> for> me, the fact that the replica assignment ordering isn't changed is
>>>> actually a big disadvantage of this KIP. If you are a new user (or
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> just>
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> an existing user that didn't read all of the documentation) and you
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> just
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> look at the replica assignment, you might be confused by why> a
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> particular
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> broker wasn't getting any leaderships, even though it appeared like it
>>>> should. More mechanisms mean more complexity> for users and developers
>>>> most of the time.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I would like stress the point that running reassignment to change
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ordering of the replica (putting a broker to the end of partition
>>>>>> assignment) is unnecessary, because after some time the broker is caught
>>>>>> up, it can start serving traffic and then need to run reassignments again
>>>>>> to "rollback" to previous states. As I
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> mentioned
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> in
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> KIP-491, this is just tedious work.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> In general, using an external rebalancing tool like Cruise Control
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> is a
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> good idea to keep things balanced without having deal with manual
>>>>> rebalancing. We expect more and more people who have a complex or large
>>>>> cluster will start using tools like this.
>>>>> 
>>>>> 
>>>>> 
>>>>> However, if you choose to do manual rebalancing, it shouldn't be that bad.
>>>>> You would save the existing partition ordering before making
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> your
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> changes, then make your changes (perhaps by running a simple command line
>>>>> tool that switches the order of the replicas). Then, once you felt like
>>>>> the broker was ready to serve traffic, you could just re-apply the old
>>>>> ordering which you had saved.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I agree this might introduce some complexities for
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> users/developers.
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> But if this feature is good, and well documented, it is good for
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> kafka product/community. Just like KIP-460 enabling unclean leader
>>>>>> election to override TopicLevel/Broker Level config of
>>>>>> `unclean.leader.election.enable`
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I agree that it would be nice if we could treat some brokers
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> differently for the purposes of placing replicas, selecting leaders,
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> etc. >
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> Right now, we don't have any way of implementing that without forking
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> broker. I would support a new PlacementPolicy class that> would close
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> this
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> gap. But I don't think this KIP is flexible enough to fill this
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> role. For
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> example, it can't prevent users from creating> new single-replica
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> topics
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> that get put on the "bad" replica. Perhaps we should reopen the
>>>> discussion> about
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> https:/ / cwiki. apache. org/ confluence/ display/ KAFKA/ 
>> KIP-201%3A+Rationalising+Policy+interfaces
>> (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces
>> )
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Creating topic with single-replica is beyond what KIP-491 is
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> trying to
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> achieve. The user needs to take responsibility of doing that. I do
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> see
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> some Samza clients notoriously creating single-replica topics and
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> that
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> got flagged by alerts, because a single broker down/maintenance
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> will
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> cause offline partitions. For KIP-491 preferred leader "blacklist", the
>>>>>> single-replica will still serve as leaders, because there is no other
>>>>>> alternative replica to be chosen as leader.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Even with a new PlacementPolicy for topic creation/partition
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> expansion,
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> it still needs the blacklist info (e.g. a zk path node, or broker
>>>>>> level/topic level config) to "blacklist" the broker to be preferred
>>>>>> leader? Would it be the same as KIP-491 is introducing?
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I was thinking about a PlacementPolicy filling the role of preventing
>>>>> people from creating single-replica partitions on a node that we
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> didn't
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> want to ever be the leader. I thought that it could also prevent people
>>>>> from designating those nodes as preferred leaders during topic creation,
>>>>> or Kafka from doing itduring random topic creation. I was assuming that
>>>>> the PlacementPolicy would determine which nodes were which through static
>>>>> configuration keys. I agree static
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> configuration
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> keys are somewhat less flexible than dynamic configuration.
>>>>> 
>>>>> 
>>>>> 
>>>>> best,
>>>>> Colin
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> George
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wednesday, August 7, 2019, 11:01:51 AM PDT, Colin McCabe
>>>>>> < cmccabe@ apache. org ( cmcc...@apache.org ) > wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 2, 2019, at 20:02, George Li wrote:
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Colin,
>>>>>>> Thanks for looking into this KIP. Sorry for the late response.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> been
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> busy.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> If a cluster has MAMY topic partitions, moving this "blacklist"
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> broker
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> to the end of replica list is still a rather "big" operation,
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> involving
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> submitting reassignments. The KIP-491 way of blacklist is much
>>>>>>> simpler/easier and can undo easily without changing the replica 
>>>>>>> assignment
>>>>>>> ordering.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hi George,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Even if you have a way of blacklisting an entire broker all at
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> once,
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> you still would need to run a leader election for each partition
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> where
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> you want to move the leader off of the blacklisted broker. So the
>>>>>> operation is still O(N) in that sense-- you have to do something
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> per
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> partition.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> In general, reassignment will get a lot easier and quicker once
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> KIP-455
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> is implemented. Reassignments that just change the order of
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> preferred
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> replicas for a specific partition should complete pretty much
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> instantly.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I think it's simpler and easier just to have one source of truth
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> for
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> what the preferred replica is for a partition, rather than two. So
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> for
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> me, the fact that the replica assignment ordering isn't changed is
>>>>>> actually a big disadvantage of this KIP. If you are a new user (or just 
>>>>>> an
>>>>>> existing user that didn't read all of the documentation)
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> and
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> you just look at the replica assignment, you might be confused by
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> why
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> a
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> particular broker wasn't getting any leaderships, even though it appeared
>>>>>> like it should. More mechanisms mean more complexity for users and
>>>>>> developers most of the time.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Major use case for me, a failed broker got swapped with new
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> hardware,
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> and starts up as empty (with latest offset of all partitions),
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> SLA
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> of retention is 1 day, so before this broker is up to be in-sync
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> for
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> 1
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> day, we would like to blacklist this broker from serving traffic.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> after
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 1 day, the blacklist is removed and run preferred leader
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> election.
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> This way, no need to run reassignments before/after. This is the
>>>>>>> "temporary" use-case.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> What if we just add an option to the reassignment tool to generate
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> a
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> plan to move all the leaders off of a specific broker? The tool
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> could
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> also run a leader election as well. That would be a simple way of doing
>>>>>> this without adding new mechanisms or broker-side
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> configurations,
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> etc.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> There are use-cases that this Preferred Leader "blacklist" can be 
>>>>>>> somewhat
>>>>>>> permanent, as I explained in the AWS data center
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> instances
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> Vs.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> on-premises data center bare metal machines (heterogenous
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> hardware),
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> that the AWS broker_ids will be blacklisted. So new topics
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> created,
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> or existing topic expansion would not make them serve traffic
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> even
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> they
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> could be the preferred leader.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I agree that it would be nice if we could treat some brokers differently
>>>>>> for the purposes of placing replicas, selecting
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> leaders,
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> etc. Right now, we don't have any way of implementing that without 
>>>>>> forking
>>>>>> the broker. I would support a new PlacementPolicy class
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> that
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> would close this gap. But I don't think this KIP is flexible
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> enough
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> to
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> fill this role. For example, it can't prevent users from creating
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> new
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> single-replica topics that get put on the "bad" replica. Perhaps
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> we
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> should reopen the discussion about
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> https:/ / cwiki. apache. org/ confluence/ display/ KAFKA/ 
>> KIP-201%3A+Rationalising+Policy+interfaces
>> (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces
>> )
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> regards,
>>>>>> Colin
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Please let me know there are more question.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> George
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe
>>>>>>> < cmccabe@ apache. org ( cmcc...@apache.org ) > wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> We still want to give the "blacklisted" broker the leadership if nobody
>>>>>>> else is available. Therefore, isn't putting a broker on
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> blacklist pretty much the same as moving it to the last entry in
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> replicas list and then triggering a preferred leader election?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> If we want this to be undone after a certain amount of time, or
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> under
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> certain conditions, that seems like something that would be more
>>>>>>> effectively done by an external system, rather than putting all
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> these
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> policies into Kafka.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> best,
>>>>>>> Colin
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jul 19, 2019, at 18:23, George Li wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi Satish,
>>>>>>>> Thanks for the reviews and feedbacks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The following is the requirements this KIP is trying to
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> accomplish:
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This can be moved to the"Proposed changes" section.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Updated the KIP-491.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> The logic to determine the priority/order of which broker
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> should be
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> preferred leader should be modified. The broker in the
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> preferred leader
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> blacklist should be moved to the end (lowest priority) when 
>>>>>>>>> determining
>>>>>>>>> leadership.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I believe there is no change required in the ordering of the
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> preferred
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> replica list. Brokers in the preferred leader blacklist are
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> skipped
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> until other brokers int he list are unavailable.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yes. partition assignment remained the same, replica &
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> ordering.
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> The
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> blacklist logic can be optimized during implementation.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> The blacklist can be at the broker level. However, there
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> might
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> be use cases
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> where a specific topic should blacklist particular brokers,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> which
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> would be at the
>>>>>>>>> Topic level Config. For this use cases of this KIP, it seems
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> that broker level
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> blacklist would suffice. Topic level preferred leader
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> blacklist
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> might
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> be future enhancement work.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I agree that the broker level preferred leader blacklist
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> would be
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> sufficient. Do you have any use cases which require topic
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> level
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> preferred blacklist?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I don't have any concrete use cases for Topic level preferred
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> leader
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> blacklist. One scenarios I can think of is when a broker has
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> high
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> CPU
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> usage, trying to identify the big topics (High MsgIn, High
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> BytesIn,
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> etc), then try to move the leaders away from this broker,
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> before
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> doing
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> an actual reassignment to change its preferred leader, try to
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> put
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> this
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> preferred_leader_blacklist in the Topic Level config, and run
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> preferred
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> leader election, and see whether CPU decreases for this broker,
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> if
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> yes, then do the reassignments to change the preferred leaders
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> to
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> be
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> "permanent" (the topic may have many partitions like 256 that
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> has
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> quite
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> a few of them having this broker as preferred leader). So this
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Topic
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Level config is an easy way of doing trial and check the
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> result.
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> You can add the below workaround as an item in the rejected
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> alternatives section
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> "Reassigning all the topic/partitions which the intended
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> broker
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> is a
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> replica for."
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Updated the KIP-491.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> George
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Friday, July 19, 2019, 08:20:22 AM PDT, Satish Duggana
>>>>>>>> < satish. duggana@ gmail. com ( satish.dugg...@gmail.com ) > wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks for the KIP. I have put my comments below.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This is a nice improvement to avoid cumbersome maintenance.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The following is the requirements this KIP is trying to
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> accomplish:
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The ability to add and remove the preferred leader
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> deprioritized
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> list/blacklist. e.g. new ZK path/node or new dynamic config.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This can be moved to the"Proposed changes" section.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The logic to determine the priority/order of which broker
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> should
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> be
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> preferred leader should be modified. The broker in the
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> preferred
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> leader
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> blacklist should be moved to the end (lowest priority) when determining
>>>>>>>> leadership.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I believe there is no change required in the ordering of the
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> preferred
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> replica list. Brokers in the preferred leader blacklist are
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> skipped
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> until other brokers int he list are unavailable.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The blacklist can be at the broker level. However, there
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> might
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> be use cases
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> where a specific topic should blacklist particular brokers,
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> which
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> would be at the
>>>>>>>> Topic level Config. For this use cases of this KIP, it seems
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> that
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> broker level
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> blacklist would suffice. Topic level preferred leader
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> blacklist
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> might
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> be future enhancement work.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I agree that the broker level preferred leader blacklist would
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> be
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> sufficient. Do you have any use cases which require topic level 
>>>>>>>> preferred
>>>>>>>> blacklist?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> You can add the below workaround as an item in the rejected
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> alternatives section
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> "Reassigning all the topic/partitions which the intended
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> broker is
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> a
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> replica for."
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Satish.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski
>>>>>>>> < stanislav@ confluent. io ( stanis...@confluent.io ) > wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hey George,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks for the KIP, it's an interesting idea.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I was wondering whether we could achieve the same thing via
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> kafka-reassign-partitions tool. As you had also said in the
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> JIRA, it is
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> true that this is currently very tedious with the tool. My
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> thoughts are
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> that we could improve the tool and give it the notion of a
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> "blacklisted
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> preferred leader".
>>>>>>>>> This would have some benefits like:
>>>>>>>>> - more fine-grained control over the blacklist. we may not
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> want
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> to
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> blacklist all the preferred leaders, as that would make the
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> blacklisted
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> broker a follower of last resort which is not very useful. In
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> the cases of
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> an underpowered AWS machine or a controller, you might
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> overshoot
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> and make
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> the broker very underutilized if you completely make it
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> leaderless.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - is not permanent. If we are to have a blacklist leaders
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> config,
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> rebalancing tools would also need to know about it and
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> manipulate/respect
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> it to achieve a fair balance.
>>>>>>>>> It seems like both problems are tied to balancing partitions,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> it's just
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> that KIP-491's use case wants to balance them against other
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> factors in a
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> more nuanced way. It makes sense to have both be done from
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> same place
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> To make note of the motivation section:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Avoid bouncing broker in order to lose its leadership
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The recommended way to make a broker lose its leadership is
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> to
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> run a
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> reassignment on its partitions
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The cross-data center cluster has AWS cloud instances which
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> have less
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> computing power
>>>>>>>>> We recommend running Kafka on homogeneous machines. It would
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> be
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> cool if the
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> system supported more flexibility in that regard but that is
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> more nuanced
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> and a preferred leader blacklist may not be the best first
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> approach to the
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> issue
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Adding a new config which can fundamentally change the way
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> replication is
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> done is complex, both for the system (the replication code is
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> complex
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> enough) and the user. Users would have another potential
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> config
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> that could
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> backfire on them - e.g if left forgotten.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Could you think of any downsides to implementing this
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> functionality (or a
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> variation of it) in the kafka-reassign-partitions. sh (
>>>>>>>>> http://kafka-reassign-partitions.sh/ ) tool? One downside I can see 
>>>>>>>>> is that
>>>>>>>>> we would not have it handle
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> new
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> partitions
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> created after the "blacklist operation". As a first
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> iteration I
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> think that
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> may be acceptable
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Stanislav
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Jul 19, 2019 at 3:20 AM George Li <
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> sql_consulting@ yahoo. com. invalid ( sql_consult...@yahoo.com.invalid ) >
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Pinging the list for the feedbacks of this KIP-491 (
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> https:/ / cwiki. apache. org/ confluence/ pages/ viewpage. 
>> action?pageId=120736982
>> (
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982
>> )
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> )
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> George
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li < 
>>>>>>>>>> sql_consulting@ yahoo.
>>>>>>>>>> com. INVALID ( sql_consult...@yahoo.com.INVALID ) > wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I have created KIP-491 (
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> https:/ / cwiki. apache. org/ confluence/ pages/ viewpage. 
>> action?pageId=120736982
>> (
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982
>> )
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> )
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> for putting a broker to the preferred leader blacklist or
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> deprioritized
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> list so when determining leadership, it's moved to the
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> lowest
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> priority for
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> some of the listed use-cases.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Please provide your comments/feedbacks.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> George
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ----- Forwarded Message ----- From: Jose Armando Garcia
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Sancio (JIRA) <
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> jira@ apache. org ( j...@apache.org ) >To: " sql_consulting@ yahoo. 
>>>>>>>>>> com (
>>>>>>>>>> sql_consult...@yahoo.com ) " <
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> sql_consulting@ yahoo. com ( sql_consult...@yahoo.com ) >Sent:
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> [Commented]
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> (KAFKA-8638) Preferred Leader Blacklist (deprioritized
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> list)
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> [
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> https:/ / issues. apache. org/ jira/ browse/ KAFKA-8638?page=com. atlassian.
>> jira. plugin. system. 
>> issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511
>> (
>> https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881511#comment-16881511
>> )
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Jose Armando Garcia Sancio commented on KAFKA-8638:
>>>>>>>>>> ---------------------------------------------------
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks for feedback and clear use cases [~sql_consulting].
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Preferred Leader Blacklist (deprioritized list)
>>>>>>>>>>> -----------------------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Key: KAFKA-8638
>>>>>>>>>>> URL:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> https:/ / issues. apache. org/ jira/ browse/ KAFKA-8638 (
>>>> https://issues.apache.org/jira/browse/KAFKA-8638 )
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Project: Kafka
>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>> Components: config, controller, core
>>>>>>>>>>> Affects Versions: 1.1.1, 2.3.0, 2.2.1
>>>>>>>>>>> Reporter: GEORGE LI
>>>>>>>>>>> Assignee: GEORGE LI
>>>>>>>>>>> Priority: Major
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Currently, the kafka preferred leader election will pick
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> broker_id
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> in the topic/partition replica assignments in a priority
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> order
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> when the
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> broker is in ISR. The preferred leader is the broker id in
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> the
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> first
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> position of replica. There are use-cases that, even the
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> first
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> broker in the
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> replica assignment is in ISR, there is a need for it to be
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> moved to the end
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> of ordering (lowest priority) when deciding leadership
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> during
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> preferred
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> leader election.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Let’s use topic/partition replica (1,2,3) as an example.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 1
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> is the
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> preferred leader. When preferred leadership is run, it
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> will
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> pick 1 as the
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> leader if it's ISR, if 1 is not online and in ISR, then
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> pick
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> 2, if 2 is not
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> in ISR, then pick 3 as the leader. There are use cases
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> that,
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> even 1 is in
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ISR, we would like it to be moved to the end of ordering
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> (lowest priority)
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> when deciding leadership during preferred leader election.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Below is a list
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> of use cases:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> * (If broker_id 1 is a swapped failed host and brought up
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> with last
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> segments or latest offset without historical data (There is
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> another effort
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> on this), it's better for it to not serve leadership till
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> it's
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> caught-up.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> * The cross-data center cluster has AWS instances which
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> have
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> less
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> computing power than the on-prem bare metal machines. We
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> could put the AWS
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> broker_ids in Preferred Leader Blacklist, so on-prem
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> brokers
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> can be elected
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> leaders, without changing the reassignments ordering of the
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> replicas.
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> * If the broker_id 1 is constantly losing leadership
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> after
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> some time:
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> "Flapping". we would want to exclude 1 to be a leader
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> unless
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> all other
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> brokers of this topic/partition are offline. The
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> “Flapping”
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> effect was
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> seen in the past when 2 or more brokers were bad, when they
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> lost leadership
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> constantly/quickly, the sets of partition replicas they
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> belong
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> to will see
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> leadership constantly changing. The ultimate solution is
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> to
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> swap these bad
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> hosts. But for quick mitigation, we can also put the bad
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> hosts in the
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Preferred Leader Blacklist to move the priority of its
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> being
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> elected as
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> leaders to the lowest.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> * If the controller is busy serving an extra load of
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> metadata requests
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> and other tasks. we would like to put the controller's
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> leaders
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> to other
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> brokers to lower its CPU load. currently bouncing to lose
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> leadership would
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> not work for Controller, because after the bounce, the
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> controller fails
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> over to another broker.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> * Avoid bouncing broker in order to lose its leadership:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> it
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> would be
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> good if we have a way to specify which broker should be
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> excluded from
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> serving traffic/leadership (without changing the replica
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> assignment
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ordering by reassignments, even though that's quick), and
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> run
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> preferred
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> leader election. A bouncing broker will cause temporary
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> URP,
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> and sometimes
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> other issues. Also a bouncing of broker (e.g. broker_id 1)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> can temporarily
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> lose all its leadership, but if another broker (e.g.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> broker_id
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> 2) fails or
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> gets bounced, some of its leaderships will likely failover
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> to
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> broker_id 1
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> on a replica with 3 brokers. If broker_id 1 is in the
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> blacklist, then in
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> such a scenario even broker_id 2 offline, the 3rd broker
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> can
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> take
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> leadership.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> The current work-around of the above is to change the
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> topic/partition's
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> replica reassignments to move the broker_id 1 from the
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> first
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> position to
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> the last position and run preferred leader election. e.g.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> (1,
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> 2, 3) => (2,
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 3, 1). This changes the replica reassignments, and we need
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> to
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> keep track of
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> the original one and restore if things change (e.g.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> controller
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> fails over
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> to another broker, the swapped empty broker caught up).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> That’s
>> 
>> 
>>> 
>>>> 
>>>> 
>>>> a rather
>>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> tedious task.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> --
> Best,
> Stanislav
> 
> 
>

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

Reply via email to