Hi all,

I just found the table is not able to be displayed correctly in the email.
I've put the table content in google doc here
<https://docs.google.com/document/d/1Y_cSkXr-qQiFFlFoGqfzGHE9m9MnIvZSgGpFP5l5o4I/edit?usp=sharing>
.

Thanks.
Luke

On Thu, Jul 25, 2024 at 6:30 PM Luke Chen <show...@gmail.com> wrote:

> Hi all,
>
> While implementing the feature in KRaft mode, I found something we need to
> change the original proposal:
>
> (1) In the KIP of "Disablement - KRaft backed Cluster
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement#KIP950:TieredStorageDisablement-Disablement-KRaftBackedCluster>",
> we said:
> Controller persists configuration change and completes disablement:
>
>    1. The controller creates a ConfigRecord and persists it in the
>    metadata topic.
>    2. The controller creates a TopicRecord to increment the tiered_epoch
>    and update the tiered_state to DISABLED state.
>    3. This update marks the completion of the disablement process,
>    indicating that tiered storage has been successfully disabled for the
>    KRaft-backed clusters. Similar to topic deletion all replicas will
>    eventually pick up the changes from the cluster metadata topic and apply
>    them to their own state. Any deletion failures will be picked up by the
>    expiration threads which should be deleting data before the log start
>    offset. If the retention policy is delete, a new expiration thread will be
>    started on leadership change on any historical tiered topic to confirm that
>    there aren't any leftover segments in remote which need deletion. After a
>    cycle in which it didn't delete anything, it will die.
>
> For the (b) step, I don't think the controller needs to create TopicRecord
> because:
> 1. The broker can fetch the "tiered_state" from the ConfigRecord
> 2. The "tiered_epoch" is not necessary because raft protocol will keep the
> order for us. The broker can rely on the raft protocol and apply them in
> order, to get the expected results.
> 3. Marking the completion of the disablement process. In KRaft, it's not
> necessary because once the ConfigRecord is accepted by the controller, it
> must be applied by all the observers "in order".
>
> So, I'd like to propose to remove the (b) step in KRaft mode.
>
> (2) Current configuration make users and implementation confusion.
> This is what originally we proposed in KIP-950:
>
> remote.storage.enable
>
> remote.log.disable.policy(new)
>
> remote storage data
>
> true
>
> null/retain/delete
>
> uploadable + readable
>
> false
>
> retain (default)
>
> readable, but remote storage is disabled? For users, they are also
> surprised if this topic is reading data from remote storage.
>
> Note: This also makes development difficult because it’s unable to
> distinguish between:
>
> (1) a topic never enables remote storage
>
> (2) a topic enabled and then disabled remote storage
>
> A problem we have is when broker startup and trying to set log start
> offset. Since the remote storage is disabled, we originally should set to
> “local log start offset”, but in case (2), we expect it to treat it as
> “remote storage enabled”, which is confusing.
>
> false
>
> delete
>
> All remote data are deleted
>
>
> Therefore, Kamal and I would like to propose a new version of the
> configuration:
>
> remote.storage.enable
>
> remote.copy.disabled (new)
>
> remote storage data
>
> true
>
> false (default)
>
> uploadable + readable
>
> true
>
> true
>
> readable
>
> false
>
> true/false
>
> All remote data are deleted
>
> The advantage is this config makes users clear what it is configuring, and
> the result is expected.
> Also, on the implementation side, we can still rely on
> "remote.storage.enable" to identify is this feature is on/off.
>
> Any thoughts about it?
>
> Thank you.
> Luke
>
>
>
> On Thu, May 30, 2024 at 6:50 PM David Jacot <dja...@confluent.io.invalid>
> wrote:
>
>> Hi all,
>>
>> Thanks for the KIP. This is definitely a worthwhile feature. However, I am
>> a bit sceptical on the ZK part of the story. The 3.8 release is supposed
>> to
>> be the last one supporting ZK so I don't really see how we could bring it
>> to ZK, knowing that we don't plan to do a 3.9 release (current plan). I
>> strongly suggest clarifying this before implementing the ZK part in order
>> to avoid having new code [1] being deleted right after 3.8 is released
>> :). Personally, I agree with Chia-Ping and Mickael. We could drop the ZK
>> part.
>>
>> [1] https://github.com/apache/kafka/pull/16131
>>
>> Best,
>> David
>>
>> On Tue, May 28, 2024 at 1:31 PM Mickael Maison <mickael.mai...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I agree with Chia-Ping, I think we could drop the ZK variant
>> > altogether, especially if this is not going to make it in 3.8.0.
>> > Even if we end up needing a 3.9.0 release, I wouldn't write a bunch of
>> > new ZooKeeper-related code in that release to delete it all right
>> > after in 4.0.
>> >
>> > Thanks,
>> > Mickael
>> >
>> > On Fri, May 24, 2024 at 5:03 PM Christo Lolov <christolo...@gmail.com>
>> > wrote:
>> > >
>> > > Hello!
>> > >
>> > > I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping
>> and
>> > > Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews!
>> > >
>> > > Realistically, I don't think I have the bandwidth to get this in
>> 3.8.0.
>> > > Due to this, I will mark tentatively the Zookeeper part for 3.9 if the
>> > > community decides that they do in fact want one more 3.x release.
>> > > I will mark the KRaft part as ready to be started and aiming for
>> either
>> > 4.0
>> > > or 3.9.
>> > >
>> > > Best,
>> > > Christo
>> >
>>
>

Reply via email to