Hi Kevin,

Thanks for the KIP.

Comments:
LC1. It's good if we can display the schema of meta.properties v2, and
what's the difference from v1, like we showed for API change.

LC2. Before this KIP, the formatting nodes with the correct cluster ID
makes sure the brokers/observer controllers will talk to the expected
controllers. But after this KIP, it's possible that the brokers/observer
controllers without formatting could connect to the wrong cluster
controllers, and get discover the cluster ID from fetch response and
persist it. Is this correct? Looks like this is the risk users need to take
if they don't want to format them?

LC3. "If this ID doesn't match the KRaft leader's the leader will reject
requests from the node"
There is a missing word in the sentence.

LC4. "If meta.properties exists without a cluster.id and is V2, it will be
discovered later (described below)"
Is it possible the meta.properties exists without a cluster.id but it's in
V1? If so, how will we handle it?

Thank you,
Luke

On Wed, Mar 4, 2026 at 2:39 AM Jun Rao via dev <[email protected]> wrote:

> Hi, Kevin,
>
> Thanks for the reply. The KIP looks good to me now.
>
> Jun
>
> On Tue, Mar 3, 2026 at 4:54 AM Kevin Wu <[email protected]> wrote:
>
> > Hi Jun,
> >
> > Thanks for the reply.
> >
> > RE JR1: "If an existing string can't be converted to uuid, we can fail
> the
> > node. This shouldn't happen for a well formatted cluster, right?"
> > Currently, you can format a cluster with a non-UUID cluster ID string,
> and
> > kafka considers this "well-formatted" (i.e. formatting code accepts
> String,
> > server startup works, and clusterId is a String in-memory etc.). Our
> > documentation references formatting with a UUID cluster id generated via
> > `kafka-storage random-uuid`, but this is not a requirement in the code.
> If
> > we make this record have a UUID to be consistent with TopicRecord, it is
> > not clear to me what the MV upgrade path is for existing clusters who
> > formatted `meta.properties` with a non-UUID String. We have to write a
> new
> > UUID cluster id, which violates the invariant that the cluster id cannot
> > change over the lifetime of a cluster.
> >
> > RE JR6: I plan on still requiring bootstrap controllers to format. This
> > means we should not expect a leader to be elected who does not have a
> > cluster id. Bootstrap controllers will fail when reading in
> meta.properties
> > in KafkaRaftServer. I will remove this section.
> >
> > RE JR7: Apologies, I mixed up the numbers with another KIP.
> >
> > RE JR8:
> > For brokers, the readers of cluster id during startup are the
> > BrokerLifecycleManager, KafkaApis, DynamicTopicClusterQuotaPublisher, and
> > endpointReadyFutures. It is okay to block startup on fetching the cluster
> > id from KRaft, since we also block startup on broker lifecycle manager
> > initial catch up future. Discovering the cluster id value for the first
> > time would only require a single FetchSnapshot or a Fetch of the
> bootstrap
> > metadata records.
> >
> > For controllers, the readers are endpointReadyFutures,
> > QuorumController, ControllerApis, ControllerRegistrationManager, and
> > DynamicTopicClusterQuotaPublisher. For bootstrap controllers, this
> blocking
> > does not occur. For observers, they are essentially brokers from the
> > perspective of KRaft, so I think it is okay to block even the
> > initialization of QuorumController until the cluster id is discovered.
> Just
> > like with brokers, we only block for 1 successful Fetch/Fetch Snapshot
> loop
> > until this data is known. One detail is that for auto-joining observers
> in
> > kraft.version=1, they need to wait until they persist cluster id before
> > they try to join the voter set.
> >
> > RE JR9.1:
> > This can also mean the broker skipped formatting, and does not have a
> > cluster id. In this case, it will persist cluster id to meta.properties.
> >
> > The other case is when the broker has a cluster.id in meta.properties.
> In
> > this case, the broker cannot discover a different cluster id via a
> > ClusterIdRecord in FetchResponse. In fact, the broker will not be able to
> > successfully complete any KRaft RPCs against the leader. For the broker
> to
> > receive a non-error FetchResponse with metadata records (which would be
> the
> > only way to learn of a different ClusterIdRecord), the KRaft leader
> checks
> > that the request cluster id is absent, or that the request cluster id
> > matches its own (which is the cluster id in its
> > meta.properties/ClusterIdRecord if the invariant I mentioned in my
> previous
> > message is enforced properly). This case could happen when bootstrap
> > endpoints point to the wrong cluster during restart of a node. The logic
> > above would result in startup timing out and shutting down the node
> because
> > the local node is not able to participate in KRaft for another cluster.
> >
> > RE JR9.2: Yes, the broker's startup will eventually timeout and fail. The
> > broker won't have cluster.id in meta.properties, and the cluster cannot
> > send the broker a cluster id via ClusterIdRecord. The same would apply
> for
> > an observer controller. This is a misconfiguration in my opinion.
> >
> > On Tue, Mar 3, 2026 at 12:22 AM Jun Rao via dev <[email protected]>
> > wrote:
> >
> > > Hi, Kevin,
> > >
> > > Thanks for the reply.
> > >
> > > JR1. ClusterIdRecord:
> > > It would be better for ClusterId to have the type uuid. This will make
> it
> > > consistent with topicId in TopicRecord. If an existing string can't be
> > > converted to uuid, we can fail the node. This shouldn't happen for a
> well
> > > formatted cluster, right?
> > >
> > > JR6. Have you decided what to include in this KIP? If this KIP still
> > > requires the formatting for bootstrap controllers, what's described
> here
> > > can't happen.
> > >
> > > JR7. "After KIP-1286, kafka operators no longer need to format all
> nodes"
> > > KIP-1286 seems to be the wrong KIP?
> > >
> > > JR8. "The readers of cluster id initialized during startup can wait for
> > > both the above before being initialized."
> > > What are those readers? Are they ok to block?
> > >
> > > JR9. A couple more upgrade scenarios.
> > > JR9.1 If the MV has been bumped, after a broker starts up, it discovers
> > > that the clusterId in ClusterIdRecord doesn't match the one in
> > > meta.properties. Will the broker fail?
> > > JR9.2 If the MV hasn't been bumped, a new broker with the new version
> of
> > > the software is started without formatting, will it fail during
> startup?
> > >
> > > Jun
> > >
> > > On Wed, Feb 18, 2026 at 8:49 AM Kevin Wu <[email protected]>
> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the replies and questions.
> > > >
> > > > RE JR1: Updated the KIP with the record schema for ClusterIdRecord.
> One
> > > > thing I'm not sure about yet is whether or not the record field
> should
> > be
> > > > of UUID or String type. This is because kafka's quickstart docs refer
> > to
> > > > setting `--cluster-id` to a UUID in the storage tool. However, many
> > > places
> > > > in kafka broker/controller code (e.g. the raft client, broker
> lifecycle
> > > > manager, and even the formatter itself) only require this type to be
> a
> > > > String. Since not all Strings are valid UUIDs, making this record
> field
> > > of
> > > > type UUID might be too restrictive and complicate upgrading the MV
> for
> > > > existing clusters, since they might have a non-UUID cluster id
> string,
> > > but
> > > > need to write this record when upgrading to an MV that supports this
> > > > feature. Let me know what you think.
> > > >
> > > > RE JR2: Any controller node formatted with `--standalone,
> > > > --initial-controllers` or who is part of the static voter set defined
> > by
> > > > `controller.quorum.voters` can write the ClusterIdRecord by including
> > the
> > > > `--cluster-id` argument to `kafka-storage format`. However, if the MV
> > of
> > > > the cluster supports it, there is exactly one writer of this record
> to
> > > the
> > > > cluster metadata partition. The writer is the first active
> controller,
> > > who
> > > > writes this record alongside other bootstrap metadata records (e.g.
> > > > metadata version) during controller activation. At this point, we
> > already
> > > > depend on MV existing, since the active controller writes these
> > bootstrap
> > > > metadata records as a transaction if the MV supports it. I think
> > writing
> > > > the cluster id record would follow a similar pattern.
> > > >
> > > > RE JR3: When a node formats, it will write the meta.properties file.
> > > During
> > > > formatting, a node must resolve the MV it wants to format with, which
> > is
> > > > explained more in RE JR5. I need to think about this more, but I
> think
> > we
> > > > should keep `--cluster-id` as a required flag for invoking the format
> > > > command. If a broker/observer controller does not format,
> > meta.properties
> > > > is written without cluster id immediately after startup (i.e. where
> we
> > > read
> > > > it from disk now in KafkaRaftServer).
> > > >
> > > > RE JR4: Yeah, will do. In this context, when I say observers I'm
> > > referring
> > > > to any controllers who are not part of the KRaft voter set when they
> > > start
> > > > kafka, or any brokers. I will make this explicit in the KIP. From the
> > > > perspective of this feature and KRaft leader election, controller
> nodes
> > > who
> > > > format with `--no-initial-controllers`, controller nodes who are not
> > part
> > > > of `controller.quorum.voters`, and brokers, all do not "need" to
> > format,
> > > > since they cannot become the active controller. This means they can
> > > resolve
> > > > metadata like the cluster id after discovering the leader. We have a
> > > > similar pattern with how controller nodes who format with
> > > > `--no-initial-controllers` discover the kraft version of the cluster.
> > > >
> > > > RE JR5: If a node formats, it must resolve a metadata version with
> > which
> > > to
> > > > format. This comes from the `--release-version/--feature` flag and
> > > defaults
> > > > to the latest production MV. Therefore, when a node formats with a
> > > metadata
> > > > version that supports this feature, it will write the ClusterIdRecord
> > to
> > > > its `0-0/bootstrap.checkpoint`. If the node formats with a metadata
> > > version
> > > > that does not support this feature, it does not write ClusterIdRecord
> > to
> > > > its `0-0/bootstrap.checkpoint`. If a node skips formatting, it is
> > assumed
> > > > that this node is part of a cluster whose MV supports this.
> Otherwise,
> > > this
> > > > is a misconfiguration and the node will fail to register with the
> > leader
> > > > since there is no way for it to persist cluster id to its
> > meta.properties
> > > > without formatting.
> > > >
> > > > Although I did not specify this yet on the KIP explicitly, after some
> > > > offline discussion I think it makes sense to enforce the following
> > > > invariant as part of the feature design: if the persisted metadata
> > > version
> > > > supports this feature, the ClusterId record must also be persisted.
> > This
> > > is
> > > > enforceable on the write-path for MV, which occurs at two points--
> > during
> > > > formatting and during feature upgrades. There is a similar pattern
> with
> > > > kraft.version, as it gets written to disk at the same two points.
> > > >
> > > > RE JR6: The main motivation for writing cluster id to meta.properties
> > as
> > > > well is because it can act as a projection of the cluster metadata
> > > > partition which essentially only exposes the cluster id to readers.
> For
> > > > example, the raft layer needs to be aware of the cluster id for its
> own
> > > RPC
> > > > handling/validation, but raft cannot read metadata records. There are
> > > many
> > > > readers of this cluster id value during the startup of the cluster.
> > > > Therefore, avoiding a read of the metadata partition to discover the
> > > value
> > > > of this metadata will prevent more complications of the startup code.
> > > >
> > > > Best,
> > > > Kevin Wu
> > > >
> > > >
> > > > On Tue, Feb 17, 2026 at 7:35 PM Jun Rao via dev <
> [email protected]>
> > > > wrote:
> > > >
> > > > > Hi, Kevin,
> > > > >
> > > > > Thanks for the KIP. A few comments.
> > > > >
> > > > > JR1. ClusterIdRecord : Could you define the record format?
> > > > >
> > > > > JR2. "a new MetadataVersion that supports encoding/decoding this
> > > record.
> > > > > This means that during formatting, the bootstrap ClusterIdRecord is
> > > only
> > > > > written if the cluster is formatted with a MV that supports this
> > > > feature."
> > > > > Could you describe who writes the ClusterIdRecord? Is it the leader
> > > > > controller? Also, when is the record written? Do we guarantee that
> MV
> > > is
> > > > > available at that time?
> > > > >
> > > > > JR3. "meta.properties can be written during kafka broker/controller
> > > > startup
> > > > > if it doesn't exist already (from formatting)"
> > > > > Could you describe when meta.properties is written? Is MV available
> > at
> > > > that
> > > > > time?
> > > > >
> > > > > JR4. "Introduce a metadata record for cluster id + observers
> persist
> > > > > cluster id to meta.properties from metadata publishing pipeline"
> > > > > Could you clarify what observers are? Are they observer controllers
> > or
> > > > are
> > > > > they brokers (which are referred to as observers to the
> controller)?
> > > > >
> > > > > JR5. "Bootstrap controllers can add a mandatory “cluster id” record
> > > > during
> > > > > formatting"
> > > > > This sounds like adding a ClusterIdRecord is optional. If so, could
> > you
> > > > > describe when a record will be added and when a record will not be
> > > added?
> > > > >
> > > > > JR6. "However, kafka should still be able to handle the case where
> a
> > > > leader
> > > > > is elected without a cluster id in meta.properties , since KRaft
> does
> > > not
> > > > > need cluster.id  in order to elect a leader.
> > > > >           In this case, the active controller will write a cluster
> id
> > > > > record during the bootstrap metadata write."
> > > > > Hmm, earlier, the KIP says "Upon discovering the cluster ID for the
> > > first
> > > > > time, these nodes need to persist this to meta.properties". Why do
> we
> > > > need
> > > > > to introduce a separate place to write the cluster id to
> > > > meta.properties.
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Wed, Feb 11, 2026 at 10:21 AM Kevin Wu <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Manually bumping this thread after finalizing a design.
> > > > > > KIP link:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories
> > > > > >
> > > > > > Best,
> > > > > > Kevin Wu
> > > > > >
> > > > > > On Tue, Jan 6, 2026 at 7:18 AM Kevin Wu <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > Hello all,
> > > > > > >
> > > > > > > I would like to start a discussion on KIP-1262, which proposes
> > > > removing
> > > > > > > the formatting requirement for brokers and observer
> controllers.
> > > > > > Currently,
> > > > > > > I am considering two high-level designs, and would appreciate
> > > > community
> > > > > > > feedback on both approaches to decide on a final design.
> > > > > > >
> > > > > > > KIP link:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories
> > > > > > >
> > > > > > > Best,
> > > > > > > Kevin Wu
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to