Hi José, Thanks for the revisions. I'm really excited to see this going forward for Kafka 3.8.
One important piece of feedback that a lot of people have given me is that they really want auto-formatting in KRaft mode. In other words, they want to start up a process and just have it do the right thing, without having to run a special command like "kafka-storage.sh format" to set up the storage directories. One reason why users want auto-formatting is that ZK mode had it. Of course, ZK mode's auto-formatting is not safe. It can lead to data loss since it breaks the replication invariant that brokers never lose data after ACKing it. But hardly any users are aware of this. All they know is that they want things to work like they did in ZK mode. Another reason why users want auto-formatting is that it makes it easier to integrate Kafka into systems like Kubernetes, Ansible, Puppet, and so forth. These systems generally let the administrator set a "desired state." They then take a look at the "actual state" and manipulate it until it matches the desired state. These process management systems tend to be oriented around spinning up a new process or dropping in a new config file. They don't like to make RPCs or invoke command-line tools. Of course it's POSSIBLE to make them do this, but it feels awkward, and it's extra work. Even worse, it's work that the integrators tend to get wrong. Most of them don't understand why naively re-formatting a controller storage directory every time it looks empty is a bad idea. In other words, If we don't implement auto-formatting in Kafka, the integrators will re-invent it outside Kafka. And they'll almost certainly do it incorrectly in a way that may cause metadata loss. So I really do think we should get this right in Kafka itself. If we can run through a few scenarios here: 1. restarting a controller with an empty storage directory The controller can contact the quorum to get the cluster ID and current MV. If the MV doesn't support quorum reconfiguration, it can bail out. Otherwise, it can re-format itself with a random directory ID. It can then remove (ID, OLD_DIR_ID) from the quorum, and add (ID, NEW_DIR_ID) to the quorum. I think this can all be done automatically without user intervention. If the remove / add steps fail (because the quorum is down, for example), then of course we can just log an exception and bail out. 2. restarting a broker with an empty storage directory The broker can contact the quorum to get the cluster ID and current MV. If the MV doesn't support directory IDs, we can bail out. Otherwise, it can reformat itself with a random directory ID and start up. Its old replicas will be correctly treated as gone due to the JBOD logic. 3. restarting a controller with a damaged metadata directory I think we can just bail out if the storage directory doesn't look right. Empty is OK. Damaged is not. 4. Bringing up a totally new cluster I think we need at least one controller node to be formatted, so that we can decide what metadata version to use. Perhaps we should even require a quorum of controller nodes to be explicitly formatted (aka, in practice, people just format them all). 5. Removing a controller I think in this case, we can have an explicit command. This is similar to the broker case, where we have the "kafka-cluster.sh unregister" command. best, Colin On Mon, Jan 8, 2024, at 10:13, José Armando García Sancio wrote: > Hi all, > > KIP-853: KRaft Controller Membership Changes is ready for another > round of discussion. > > There was a previous discussion thread at > https://lists.apache.org/thread/zb5l1fsqw9vj25zkmtnrk6xm7q3dkm1v > > I have changed the KIP quite a bit since that discussion. The core > idea is still the same. I changed some of the details to be consistent > with some of the protocol changes to Kafka since the original KIP. I > also added a section that better describes the feature's UX. > > KIP: https://cwiki.apache.org/confluence/x/nyH1D > > Thanks. Your feedback is greatly appreciated! > -- > -José