Correction: (2): Wait for all the remote segments to be deleted async due to breach by retention time (or) size, then set the `remote.storage.enable = false` and `remote.log.delete.on.disable = true`. This step is optional.
On Thu, Jul 25, 2024 at 11:13 PM Kamal Chandraprakash < kamal.chandraprak...@gmail.com> wrote: > Hi Chia-Ping, > > Thanks for the review! > > > If so, what is the purpose of `remote.log.delete.on.disable=false`? > > IIUC, the purpose of `remote.log.delete.on.disable` is to get explicit > confirmation from the user > before deleting the remote log segments. The concern raised in the thread > is that if the user > accidentally changes the value of `remote.storage.enable` from true to > false, then remote segments > get lost. > > For ungraceful disablement, (ie) disabling the remote storage for the > topic and deleting all the > remote segments, the user should set both the configs at once: > > (1) remote.storage.enable = false and remote.log.delete.on.disable = true > > If the user accidentally sets only the remote.storage.enable = true and > leaves the `remote.log.delete.on.disable` > with default value of `false`, then we will throw ConfigException to > prevent the deletion of remote logs. > > For graceful disablement, the user should set: > > (1): remote.copy.disabled = true. > (2): Wait for all the remote segments to be deleted async due to breach by > retention time (or) size, > then set the `remote.storage.enable = false`. This step is > optional. > > Luke, > > In ZK mode, once the topic config value gets updated, then it gets saved > in the /configs/topics/<topic> znode. > If we throw an exception from the server for invalid config, then there > will be inconsistency between the CLI tools > and the actual state of the topic in the cluster. This can cause some > confusion to the users whether tiered storage > is disabled or not. I don't know how the Kraft topic config > propagation/validation works. > > -- > Kamal > > On Thu, Jul 25, 2024 at 7:10 PM Chia-Ping Tsai <chia7...@gmail.com> wrote: > >> remote.storage.enable=false >> remote.log.delete.on.disable=false (default) >> If the topic config is set to this, or changed to this, we'll return >> ConfigException during validation. >> >> Pardon me, I'm a bit confused. >> >> when `remote.storage.enable=true`, `remote.log.delete.on.disable=false` is >> no-op >> when `remote.storage.enable=false`, `remote.log.delete.on.disable=false` >> is >> error >> >> If `remote.log.delete.on.disable` must be true when setting >> `remote.storage.enable` >> to false, does it mean changing `remote.storage.enable` to false is >> expected to delete remote storage topic data"? >> >> If so, what is the purpose of `remote.log.delete.on.disable=false`? >> >> Best, >> Chia-Ping >> >> Luke Chen <show...@gmail.com> 於 2024年7月25日 週四 下午8:51寫道: >> >> > Hi Christo, >> > >> > Thanks for your reply. >> > >> > > keep the remote.log.disable.policy, but only allow it to take a value >> of >> > "delete". >> > >> > I agree, or maybe make it a boolean value, and rename it to >> > `remote.log.delete.on.disable`, which is clearer. >> > And because of this new config, there will be a case that the config is >> > like this: >> > >> > remote.storage.enable=false >> > remote.log.delete.on.disable=false (default) >> > >> > That means, in this case, we'll keep all remote storage data, but close >> all >> > remote log tasks, and make "log start offset = local log start offset". >> > This will make the remote storage metadata in an unknown state because >> the >> > data in the remote storage is inaccessible anymore (since log start >> moved >> > to LLSO). And once this topic re-enables the `remote.storage.enable`, >> the >> > old remote log metadata will be included, but log start offset is not >> > expected anymore.... >> > >> > So, I'd like to propose that we don't allow this configuration: >> > >> > remote.storage.enable=false >> > remote.log.delete.on.disable=false (default) >> > >> > If the topic config is set to this, or changed to this, we'll return >> > ConfigException during validation. >> > >> > To make it clear, this is the new proposed solution: >> > >> > >> https://docs.google.com/document/d/1Y_cSkXr-qQiFFlFoGqfzGHE9m9MnIvZSgGpFP5l5o4I/edit >> > >> > Let me know what you think. >> > >> > Thanks. >> > Luke >> > >> > >> > >> > On Thu, Jul 25, 2024 at 8:07 PM Christo Lolov <christolo...@gmail.com> >> > wrote: >> > >> > > Hello! >> > > >> > > Thank you for raising this! >> > > >> > > Up to now KIP-950 took the stance that you can disable tiering >> whenever >> > you >> > > wish as long as you specify what you would like to do with the data in >> > > remote. Amongst other things it also made the promise that it will not >> > > delete data without a user explicitly saying that they want their data >> > > deleted. In other words there is a 2-step verification that the user >> > truly >> > > wants their data deleted. >> > > >> > > From the table of the new proposal I am left with the impression that >> the >> > > moment a user tries to disable tiering their data will by deleted. In >> > other >> > > words, there is no 2-step verification that they want their data >> deleted. >> > > >> > > On a first read, I wouldn't be opposed to this proposal since it >> > provides a >> > > neat alternative to the tiered epoch as long as there is still a >> 2-step >> > > verification that the user is aware their data will be deleted. I >> think >> > > that a reasonable way to achieve this is to keep the >> > > remote.log.disable.policy, but only allow it to take a value of >> "delete". >> > > >> > > What are your thoughts? >> > > >> > > Best, >> > > Christo >> > > >> > > >> > > On Thu, 25 Jul 2024 at 12:10, Luke Chen <show...@gmail.com> wrote: >> > > >> > > > Hi all, >> > > > >> > > > I just found the table is not able to be displayed correctly in the >> > > email. >> > > > I've put the table content in google doc here >> > > > < >> > > > >> > > >> > >> https://docs.google.com/document/d/1Y_cSkXr-qQiFFlFoGqfzGHE9m9MnIvZSgGpFP5l5o4I/edit?usp=sharing >> > > > > >> > > > . >> > > > >> > > > Thanks. >> > > > Luke >> > > > >> > > > On Thu, Jul 25, 2024 at 6:30 PM Luke Chen <show...@gmail.com> >> wrote: >> > > > >> > > > > Hi all, >> > > > > >> > > > > While implementing the feature in KRaft mode, I found something we >> > need >> > > > to >> > > > > change the original proposal: >> > > > > >> > > > > (1) In the KIP of "Disablement - KRaft backed Cluster >> > > > > < >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-KRaftBackedCluster >> > > > >", >> > > > > we said: >> > > > > Controller persists configuration change and completes >> disablement: >> > > > > >> > > > > 1. The controller creates a ConfigRecord and persists it in the >> > > > > metadata topic. >> > > > > 2. The controller creates a TopicRecord to increment the >> > > tiered_epoch >> > > > > and update the tiered_state to DISABLED state. >> > > > > 3. This update marks the completion of the disablement process, >> > > > > indicating that tiered storage has been successfully disabled >> for >> > > the >> > > > > KRaft-backed clusters. Similar to topic deletion all replicas >> will >> > > > > eventually pick up the changes from the cluster metadata topic >> and >> > > > apply >> > > > > them to their own state. Any deletion failures will be picked >> up >> > by >> > > > the >> > > > > expiration threads which should be deleting data before the log >> > > start >> > > > > offset. If the retention policy is delete, a new expiration >> thread >> > > > will be >> > > > > started on leadership change on any historical tiered topic to >> > > > confirm that >> > > > > there aren't any leftover segments in remote which need >> deletion. >> > > > After a >> > > > > cycle in which it didn't delete anything, it will die. >> > > > > >> > > > > For the (b) step, I don't think the controller needs to create >> > > > TopicRecord >> > > > > because: >> > > > > 1. The broker can fetch the "tiered_state" from the ConfigRecord >> > > > > 2. The "tiered_epoch" is not necessary because raft protocol will >> > keep >> > > > the >> > > > > order for us. The broker can rely on the raft protocol and apply >> them >> > > in >> > > > > order, to get the expected results. >> > > > > 3. Marking the completion of the disablement process. In KRaft, >> it's >> > > not >> > > > > necessary because once the ConfigRecord is accepted by the >> > controller, >> > > it >> > > > > must be applied by all the observers "in order". >> > > > > >> > > > > So, I'd like to propose to remove the (b) step in KRaft mode. >> > > > > >> > > > > (2) Current configuration make users and implementation confusion. >> > > > > This is what originally we proposed in KIP-950: >> > > > > >> > > > > remote.storage.enable >> > > > > >> > > > > remote.log.disable.policy(new) >> > > > > >> > > > > remote storage data >> > > > > >> > > > > true >> > > > > >> > > > > null/retain/delete >> > > > > >> > > > > uploadable + readable >> > > > > >> > > > > false >> > > > > >> > > > > retain (default) >> > > > > >> > > > > readable, but remote storage is disabled? For users, they are also >> > > > > surprised if this topic is reading data from remote storage. >> > > > > >> > > > > Note: This also makes development difficult because it’s unable to >> > > > > distinguish between: >> > > > > >> > > > > (1) a topic never enables remote storage >> > > > > >> > > > > (2) a topic enabled and then disabled remote storage >> > > > > >> > > > > A problem we have is when broker startup and trying to set log >> start >> > > > > offset. Since the remote storage is disabled, we originally should >> > set >> > > to >> > > > > “local log start offset”, but in case (2), we expect it to treat >> it >> > as >> > > > > “remote storage enabled”, which is confusing. >> > > > > >> > > > > false >> > > > > >> > > > > delete >> > > > > >> > > > > All remote data are deleted >> > > > > >> > > > > >> > > > > Therefore, Kamal and I would like to propose a new version of the >> > > > > configuration: >> > > > > >> > > > > remote.storage.enable >> > > > > >> > > > > remote.copy.disabled (new) >> > > > > >> > > > > remote storage data >> > > > > >> > > > > true >> > > > > >> > > > > false (default) >> > > > > >> > > > > uploadable + readable >> > > > > >> > > > > true >> > > > > >> > > > > true >> > > > > >> > > > > readable >> > > > > >> > > > > false >> > > > > >> > > > > true/false >> > > > > >> > > > > All remote data are deleted >> > > > > >> > > > > The advantage is this config makes users clear what it is >> > configuring, >> > > > and >> > > > > the result is expected. >> > > > > Also, on the implementation side, we can still rely on >> > > > > "remote.storage.enable" to identify is this feature is on/off. >> > > > > >> > > > > Any thoughts about it? >> > > > > >> > > > > Thank you. >> > > > > Luke >> > > > > >> > > > > >> > > > > >> > > > > On Thu, May 30, 2024 at 6:50 PM David Jacot >> > > <dja...@confluent.io.invalid >> > > > > >> > > > > wrote: >> > > > > >> > > > >> Hi all, >> > > > >> >> > > > >> Thanks for the KIP. This is definitely a worthwhile feature. >> > However, >> > > I >> > > > am >> > > > >> a bit sceptical on the ZK part of the story. The 3.8 release is >> > > supposed >> > > > >> to >> > > > >> be the last one supporting ZK so I don't really see how we could >> > bring >> > > > it >> > > > >> to ZK, knowing that we don't plan to do a 3.9 release (current >> > plan). >> > > I >> > > > >> strongly suggest clarifying this before implementing the ZK part >> in >> > > > order >> > > > >> to avoid having new code [1] being deleted right after 3.8 is >> > released >> > > > >> :). Personally, I agree with Chia-Ping and Mickael. We could drop >> > the >> > > ZK >> > > > >> part. >> > > > >> >> > > > >> [1] https://github.com/apache/kafka/pull/16131 >> > > > >> >> > > > >> Best, >> > > > >> David >> > > > >> >> > > > >> On Tue, May 28, 2024 at 1:31 PM Mickael Maison < >> > > > mickael.mai...@gmail.com> >> > > > >> wrote: >> > > > >> >> > > > >> > Hi, >> > > > >> > >> > > > >> > I agree with Chia-Ping, I think we could drop the ZK variant >> > > > >> > altogether, especially if this is not going to make it in >> 3.8.0. >> > > > >> > Even if we end up needing a 3.9.0 release, I wouldn't write a >> > bunch >> > > of >> > > > >> > new ZooKeeper-related code in that release to delete it all >> right >> > > > >> > after in 4.0. >> > > > >> > >> > > > >> > Thanks, >> > > > >> > Mickael >> > > > >> > >> > > > >> > On Fri, May 24, 2024 at 5:03 PM Christo Lolov < >> > > christolo...@gmail.com >> > > > > >> > > > >> > wrote: >> > > > >> > > >> > > > >> > > Hello! >> > > > >> > > >> > > > >> > > I am closing this vote as ACCEPTED with 3 binding +1 (Luke, >> > > > Chia-Ping >> > > > >> and >> > > > >> > > Satish) and 1 non-binding +1 (Kamal) - thank you for the >> > reviews! >> > > > >> > > >> > > > >> > > Realistically, I don't think I have the bandwidth to get >> this in >> > > > >> 3.8.0. >> > > > >> > > Due to this, I will mark tentatively the Zookeeper part for >> 3.9 >> > if >> > > > the >> > > > >> > > community decides that they do in fact want one more 3.x >> > release. >> > > > >> > > I will mark the KRaft part as ready to be started and aiming >> for >> > > > >> either >> > > > >> > 4.0 >> > > > >> > > or 3.9. >> > > > >> > > >> > > > >> > > Best, >> > > > >> > > Christo >> > > > >> > >> > > > >> >> > > > > >> > > > >> > > >> > >> >