Hi all,

Community members Jason Gustafson, Colin P. McCabe and I have been
having some offline conversations.

At a high-level KIP-853 solves the problems:
1) How can KRaft detect and recover from disk failures on the minority
of the voters?
2) How can KRaft support a changing set of voter nodes?

I think that problem 2) is a superset of problem 1). The mechanism for
solving problem 2) can be used to solve problem 1). This is the reason
that I decided to design them together and proposed this KIP. Problem
2) adds the additional requirement of how observers (Brokers and new
Controllers) discover the leader? KIP-853 solves this problem by
returning the endpoint of the leader in all of the KRaft RPCs. There
are some concerns with this approach.

To solve problem 1) we don't need to return the leader's endpoint
since it is expressed in the controller.quorum.voters property. To
make faster progress on 1) I have decided to create "KIP-856: KRaft
Disk Failure Recovery" that just addresses this problem. I will be
starting a discussion thread for KIP-856 soon.

We can continue the discussion of KIP-853 here. If KIP-856 gets
approved I will either:
3) Modify KIP-853 to just describe the improvement needed on top of KIP-856.
4) Create a new KIP and abandon KIP-853. This new KIP will take into
account all of the discussion from this thread.

Thanks!
-- 
-José

Reply via email to