Hi all,

While implementing the feature in KRaft mode, I found something we need to
change the original proposal:

(1) In the KIP of "Disablement - KRaft backed Cluster
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-KRaftBackedCluster>",
we said:
Controller persists configuration change and completes disablement:

   1. The controller creates a ConfigRecord and persists it in the metadata
   topic.
   2. The controller creates a TopicRecord to increment the tiered_epoch
   and update the tiered_state to DISABLED state.
   3. This update marks the completion of the disablement process,
   indicating that tiered storage has been successfully disabled for the
   KRaft-backed clusters. Similar to topic deletion all replicas will
   eventually pick up the changes from the cluster metadata topic and apply
   them to their own state. Any deletion failures will be picked up by the
   expiration threads which should be deleting data before the log start
   offset. If the retention policy is delete, a new expiration thread will be
   started on leadership change on any historical tiered topic to confirm that
   there aren't any leftover segments in remote which need deletion. After a
   cycle in which it didn't delete anything, it will die.

For the (b) step, I don't think the controller needs to create TopicRecord
because:
1. The broker can fetch the "tiered_state" from the ConfigRecord
2. The "tiered_epoch" is not necessary because raft protocol will keep the
order for us. The broker can rely on the raft protocol and apply them in
order, to get the expected results.
3. Marking the completion of the disablement process. In KRaft, it's not
necessary because once the ConfigRecord is accepted by the controller, it
must be applied by all the observers "in order".

So, I'd like to propose to remove the (b) step in KRaft mode.

(2) Current configuration make users and implementation confusion.
This is what originally we proposed in KIP-950:

remote.storage.enable

remote.log.disable.policy(new)

remote storage data

true

null/retain/delete

uploadable + readable

false

retain (default)

readable, but remote storage is disabled? For users, they are also
surprised if this topic is reading data from remote storage.

Note: This also makes development difficult because it’s unable to
distinguish between:

(1) a topic never enables remote storage

(2) a topic enabled and then disabled remote storage

A problem we have is when broker startup and trying to set log start
offset. Since the remote storage is disabled, we originally should set to
“local log start offset”, but in case (2), we expect it to treat it as
“remote storage enabled”, which is confusing.

false

delete

All remote data are deleted


Therefore, Kamal and I would like to propose a new version of the
configuration:

remote.storage.enable

remote.copy.disabled (new)

remote storage data

true

false (default)

uploadable + readable

true

true

readable

false

true/false

All remote data are deleted

The advantage is this config makes users clear what it is configuring, and
the result is expected.
Also, on the implementation side, we can still rely on
"remote.storage.enable" to identify is this feature is on/off.

Any thoughts about it?

Thank you.
Luke



On Thu, May 30, 2024 at 6:50 PM David Jacot <dja...@confluent.io.invalid>
wrote:

> Hi all,
>
> Thanks for the KIP. This is definitely a worthwhile feature. However, I am
> a bit sceptical on the ZK part of the story. The 3.8 release is supposed to
> be the last one supporting ZK so I don't really see how we could bring it
> to ZK, knowing that we don't plan to do a 3.9 release (current plan). I
> strongly suggest clarifying this before implementing the ZK part in order
> to avoid having new code [1] being deleted right after 3.8 is released
> :). Personally, I agree with Chia-Ping and Mickael. We could drop the ZK
> part.
>
> [1] https://github.com/apache/kafka/pull/16131
>
> Best,
> David
>
> On Tue, May 28, 2024 at 1:31 PM Mickael Maison <mickael.mai...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I agree with Chia-Ping, I think we could drop the ZK variant
> > altogether, especially if this is not going to make it in 3.8.0.
> > Even if we end up needing a 3.9.0 release, I wouldn't write a bunch of
> > new ZooKeeper-related code in that release to delete it all right
> > after in 4.0.
> >
> > Thanks,
> > Mickael
> >
> > On Fri, May 24, 2024 at 5:03 PM Christo Lolov <christolo...@gmail.com>
> > wrote:
> > >
> > > Hello!
> > >
> > > I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping
> and
> > > Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews!
> > >
> > > Realistically, I don't think I have the bandwidth to get this in 3.8.0.
> > > Due to this, I will mark tentatively the Zookeeper part for 3.9 if the
> > > community decides that they do in fact want one more 3.x release.
> > > I will mark the KRaft part as ready to be started and aiming for either
> > 4.0
> > > or 3.9.
> > >
> > > Best,
> > > Christo
> >
>

Reply via email to