Hi all, While implementing the feature in KRaft mode, I found something we need to change the original proposal:
(1) In the KIP of "Disablement - KRaft backed Cluster <https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-KRaftBackedCluster>", we said: Controller persists configuration change and completes disablement: 1. The controller creates a ConfigRecord and persists it in the metadata topic. 2. The controller creates a TopicRecord to increment the tiered_epoch and update the tiered_state to DISABLED state. 3. This update marks the completion of the disablement process, indicating that tiered storage has been successfully disabled for the KRaft-backed clusters. Similar to topic deletion all replicas will eventually pick up the changes from the cluster metadata topic and apply them to their own state. Any deletion failures will be picked up by the expiration threads which should be deleting data before the log start offset. If the retention policy is delete, a new expiration thread will be started on leadership change on any historical tiered topic to confirm that there aren't any leftover segments in remote which need deletion. After a cycle in which it didn't delete anything, it will die. For the (b) step, I don't think the controller needs to create TopicRecord because: 1. The broker can fetch the "tiered_state" from the ConfigRecord 2. The "tiered_epoch" is not necessary because raft protocol will keep the order for us. The broker can rely on the raft protocol and apply them in order, to get the expected results. 3. Marking the completion of the disablement process. In KRaft, it's not necessary because once the ConfigRecord is accepted by the controller, it must be applied by all the observers "in order". So, I'd like to propose to remove the (b) step in KRaft mode. (2) Current configuration make users and implementation confusion. This is what originally we proposed in KIP-950: remote.storage.enable remote.log.disable.policy(new) remote storage data true null/retain/delete uploadable + readable false retain (default) readable, but remote storage is disabled? For users, they are also surprised if this topic is reading data from remote storage. Note: This also makes development difficult because it’s unable to distinguish between: (1) a topic never enables remote storage (2) a topic enabled and then disabled remote storage A problem we have is when broker startup and trying to set log start offset. Since the remote storage is disabled, we originally should set to “local log start offset”, but in case (2), we expect it to treat it as “remote storage enabled”, which is confusing. false delete All remote data are deleted Therefore, Kamal and I would like to propose a new version of the configuration: remote.storage.enable remote.copy.disabled (new) remote storage data true false (default) uploadable + readable true true readable false true/false All remote data are deleted The advantage is this config makes users clear what it is configuring, and the result is expected. Also, on the implementation side, we can still rely on "remote.storage.enable" to identify is this feature is on/off. Any thoughts about it? Thank you. Luke On Thu, May 30, 2024 at 6:50 PM David Jacot <dja...@confluent.io.invalid> wrote: > Hi all, > > Thanks for the KIP. This is definitely a worthwhile feature. However, I am > a bit sceptical on the ZK part of the story. The 3.8 release is supposed to > be the last one supporting ZK so I don't really see how we could bring it > to ZK, knowing that we don't plan to do a 3.9 release (current plan). I > strongly suggest clarifying this before implementing the ZK part in order > to avoid having new code [1] being deleted right after 3.8 is released > :). Personally, I agree with Chia-Ping and Mickael. We could drop the ZK > part. > > [1] https://github.com/apache/kafka/pull/16131 > > Best, > David > > On Tue, May 28, 2024 at 1:31 PM Mickael Maison <mickael.mai...@gmail.com> > wrote: > > > Hi, > > > > I agree with Chia-Ping, I think we could drop the ZK variant > > altogether, especially if this is not going to make it in 3.8.0. > > Even if we end up needing a 3.9.0 release, I wouldn't write a bunch of > > new ZooKeeper-related code in that release to delete it all right > > after in 4.0. > > > > Thanks, > > Mickael > > > > On Fri, May 24, 2024 at 5:03 PM Christo Lolov <christolo...@gmail.com> > > wrote: > > > > > > Hello! > > > > > > I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping > and > > > Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews! > > > > > > Realistically, I don't think I have the bandwidth to get this in 3.8.0. > > > Due to this, I will mark tentatively the Zookeeper part for 3.9 if the > > > community decides that they do in fact want one more 3.x release. > > > I will mark the KRaft part as ready to be started and aiming for either > > 4.0 > > > or 3.9. > > > > > > Best, > > > Christo > > >