Hi all, I just found the table is not able to be displayed correctly in the email. I've put the table content in google doc here <https://docs.google.com/document/d/1Y_cSkXr-qQiFFlFoGqfzGHE9m9MnIvZSgGpFP5l5o4I/edit?usp=sharing> .
Thanks. Luke On Thu, Jul 25, 2024 at 6:30 PM Luke Chen <show...@gmail.com> wrote: > Hi all, > > While implementing the feature in KRaft mode, I found something we need to > change the original proposal: > > (1) In the KIP of "Disablement - KRaft backed Cluster > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-KRaftBackedCluster>", > we said: > Controller persists configuration change and completes disablement: > > 1. The controller creates a ConfigRecord and persists it in the > metadata topic. > 2. The controller creates a TopicRecord to increment the tiered_epoch > and update the tiered_state to DISABLED state. > 3. This update marks the completion of the disablement process, > indicating that tiered storage has been successfully disabled for the > KRaft-backed clusters. Similar to topic deletion all replicas will > eventually pick up the changes from the cluster metadata topic and apply > them to their own state. Any deletion failures will be picked up by the > expiration threads which should be deleting data before the log start > offset. If the retention policy is delete, a new expiration thread will be > started on leadership change on any historical tiered topic to confirm that > there aren't any leftover segments in remote which need deletion. After a > cycle in which it didn't delete anything, it will die. > > For the (b) step, I don't think the controller needs to create TopicRecord > because: > 1. The broker can fetch the "tiered_state" from the ConfigRecord > 2. The "tiered_epoch" is not necessary because raft protocol will keep the > order for us. The broker can rely on the raft protocol and apply them in > order, to get the expected results. > 3. Marking the completion of the disablement process. In KRaft, it's not > necessary because once the ConfigRecord is accepted by the controller, it > must be applied by all the observers "in order". > > So, I'd like to propose to remove the (b) step in KRaft mode. > > (2) Current configuration make users and implementation confusion. > This is what originally we proposed in KIP-950: > > remote.storage.enable > > remote.log.disable.policy(new) > > remote storage data > > true > > null/retain/delete > > uploadable + readable > > false > > retain (default) > > readable, but remote storage is disabled? For users, they are also > surprised if this topic is reading data from remote storage. > > Note: This also makes development difficult because it’s unable to > distinguish between: > > (1) a topic never enables remote storage > > (2) a topic enabled and then disabled remote storage > > A problem we have is when broker startup and trying to set log start > offset. Since the remote storage is disabled, we originally should set to > “local log start offset”, but in case (2), we expect it to treat it as > “remote storage enabled”, which is confusing. > > false > > delete > > All remote data are deleted > > > Therefore, Kamal and I would like to propose a new version of the > configuration: > > remote.storage.enable > > remote.copy.disabled (new) > > remote storage data > > true > > false (default) > > uploadable + readable > > true > > true > > readable > > false > > true/false > > All remote data are deleted > > The advantage is this config makes users clear what it is configuring, and > the result is expected. > Also, on the implementation side, we can still rely on > "remote.storage.enable" to identify is this feature is on/off. > > Any thoughts about it? > > Thank you. > Luke > > > > On Thu, May 30, 2024 at 6:50 PM David Jacot <dja...@confluent.io.invalid> > wrote: > >> Hi all, >> >> Thanks for the KIP. This is definitely a worthwhile feature. However, I am >> a bit sceptical on the ZK part of the story. The 3.8 release is supposed >> to >> be the last one supporting ZK so I don't really see how we could bring it >> to ZK, knowing that we don't plan to do a 3.9 release (current plan). I >> strongly suggest clarifying this before implementing the ZK part in order >> to avoid having new code [1] being deleted right after 3.8 is released >> :). Personally, I agree with Chia-Ping and Mickael. We could drop the ZK >> part. >> >> [1] https://github.com/apache/kafka/pull/16131 >> >> Best, >> David >> >> On Tue, May 28, 2024 at 1:31 PM Mickael Maison <mickael.mai...@gmail.com> >> wrote: >> >> > Hi, >> > >> > I agree with Chia-Ping, I think we could drop the ZK variant >> > altogether, especially if this is not going to make it in 3.8.0. >> > Even if we end up needing a 3.9.0 release, I wouldn't write a bunch of >> > new ZooKeeper-related code in that release to delete it all right >> > after in 4.0. >> > >> > Thanks, >> > Mickael >> > >> > On Fri, May 24, 2024 at 5:03 PM Christo Lolov <christolo...@gmail.com> >> > wrote: >> > > >> > > Hello! >> > > >> > > I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping >> and >> > > Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews! >> > > >> > > Realistically, I don't think I have the bandwidth to get this in >> 3.8.0. >> > > Due to this, I will mark tentatively the Zookeeper part for 3.9 if the >> > > community decides that they do in fact want one more 3.x release. >> > > I will mark the KRaft part as ready to be started and aiming for >> either >> > 4.0 >> > > or 3.9. >> > > >> > > Best, >> > > Christo >> > >> >