Hi All: Happy New Year! ! Bumping this thread again for more possible discussion before the vote starts. Thanks a lot !
Regards Jian jian fu <[email protected]> 于2025年12月15日周一 20:00写道: > Hi All: > > Bumping this thread for more discussion. I’d really appreciate more > suggestions on this optional feature for tiered storage. Thanks a lot ! > > Regards > > Jian > > jian fu <[email protected]> 于2025年12月4日周四 21:54写道: > >> Hi All: >> >> I updated the KIP content according to Kamal and Haiying's discussion: >> 1 Explicitly emphasized that this is a topic-level optional feature >> intended for users who prioritize cost. >> 1 Added the cost-saving calculation example >> 2 Added additional details about the operational drawback of this >> feature: need extra disk expansion for the case: long time remote >> storage's outage. >> 3 Added the scenarios where it may not be very suitable/ beneficial to >> enable the feature such as the topic's ratio for remote:local retention is >> a very big value. >> >> Thanks again for joining the discussion. >> >> Regards >> Jian >> >> jian fu <[email protected]> 于2025年12月2日周二 20:27写道: >> >>> Hi Kamal: >>> >>> I think I understand what you mean now. I’ve updated the picture in the >>> link(https://github.com/apache/kafka/pull/20913#issuecomment-3601274230) >>> . >>> Could you help double-check whether we’ve reached the same understanding? >>> In short. the drawback of this KIP is that, during a long time remote >>> storage outage. it will occupied more disk. The max value is the redundant >>> part we saving. >>> Thus. After the outage recovered. It will come back to the beginning. >>> Pls help to correct me if my understanding is wrong! Thanks again. >>> >>> Regards >>> Jian >>> >>> Kamal Chandraprakash <[email protected]> 于2025年12月2日周二 >>> 19:29写道: >>> >>>> The already uploaded segments are eligible for deletion from the broker. >>>> So, when remote storage is down, >>>> then those segments can be deleted as per the local retention settings >>>> and >>>> new segments can occupy those spaces. >>>> This provides more time for the Admin to act when remote storage is down >>>> for a longer time. >>>> >>>> This is from a reliability perspective. >>>> >>>> On Tue, Dec 2, 2025 at 4:47 PM jian fu <[email protected]> wrote: >>>> >>>> > Hi Kamal and Haiying Cai: >>>> > >>>> > maybe you notice that my kafka clusters set 1day local + 3 days-7 days >>>> > remote. thus Haiying Cai‘s configure is 3 hours local + 3 days >>>> remote. >>>> > >>>> > I can explain more about my configure. >>>> > I try to avoid the latency for some delay consumer to access the >>>> remote. >>>> > Maybe some applications may encounter some unexpected issue. but we >>>> need to >>>> > give enough time to handle it. In the period, we don't want the >>>> consumer to >>>> > access the remote to hurt the whole kafka clusters. So one day is our >>>> > expectation. >>>> > >>>> > I saw one statement in Haiying Cai KIP1248: >>>> > " Currently, when a new consumer or a fallen-off consumer requires >>>> fetching >>>> > messages from a while ago, and those messages are no longer present >>>> in the >>>> > Kafka broker's local storage, the broker must download the message >>>> from the >>>> > remote tiered storage and subsequently transfer the data back to the >>>> > consumer. " >>>> > Extend the local retention time is how we try to avoid the issue >>>> (Here, we >>>> > don't consider the case one new consumer use the earliest strategy to >>>> > consume. it is not often happen in our cases.) >>>> > >>>> > So. based my configure. I will see there is one day's duplicated >>>> segment >>>> > wasting in remote storage. Thus I don't use them for real time >>>> analyst or >>>> > care about the fast reboot or some thing else. So propose this KIP >>>> to take >>>> > one topic level optional feature to help us to reduce waste and save >>>> money. >>>> > >>>> > Regards >>>> > Jian >>>> > >>>> > jian fu <[email protected]> 于2025年12月2日周二 18:42写道: >>>> > >>>> > > Hi Kamal: >>>> > > >>>> > > Thanks for joining this discussion. Let me try to classify my >>>> understands >>>> > > for your good questions: >>>> > > >>>> > > 1 Kamal : Do you also have to update the RemoteCopy lag segments >>>> and >>>> > > bytes metric? >>>> > > Jian: The code just delay the upload time for local segment. >>>> So it >>>> > > seems there is no need to change any lag segments or metrics. right? >>>> > > >>>> > > 2 Kamal : As Haiying mentioned, the segments get eventually >>>> uploaded >>>> > to >>>> > > remote so not sure about the >>>> > > benefit of this proposal. And, remote storage cost is considered as >>>> low >>>> > > when compared to broker local-disk. >>>> > > Jian: The cost benefit is about the total size for occupied. >>>> Take >>>> > AWS >>>> > > S3 as example. Tiered price for: 1 GB is 0.02 USD (You can refer to >>>> > > https://calculator.aws/#/createCalculator/S3). >>>> > > It is cheaper than local disk. So as I mentioned that the saving >>>> money >>>> > > depend on the ratio local vs remote retention time. If your set the >>>> > remote >>>> > > storage time as a long time. The benefit is few, It is just >>>> avoiding the >>>> > > waste instead of cost saving. >>>> > > So I take it as topic level optional configure instead of default >>>> > > feature. >>>> > > >>>> > > 3 Kamal: It provides some cushion during third-party object >>>> storage >>>> > > downtime. >>>> > > Jian: I draw one picture to try to under the logic( >>>> > > https://github.com/apache/kafka/pull/20913#issuecomment-3601274230). >>>> You >>>> > > can help to check if my understanding is right. I seemed that no >>>> > difference >>>> > > for them. So for this question. maybe we need to discuss more about >>>> it. >>>> > The >>>> > > only difference maybe we may increase a little local disk for temp >>>> due to >>>> > > the delay for upload remote. So in the original proposal. I want to >>>> > upload >>>> > > N-1 segments. But it seems the value is not much. >>>> > > >>>> > > BTW. I want to classify one basic rule: this feature isn't to >>>> change the >>>> > > default behavior. and the saving amount is not very big value in all >>>> > cases. >>>> > > It is suitable for part of topic which set a low ratio for >>>> remote/local >>>> > > such as 7days/1days or 3days/1day >>>> > > At the last. Thanks again for your time and your comments. All the >>>> > > questions are valid and good for us to thing more about it. >>>> > > >>>> > > Regards >>>> > > Jian >>>> > > >>>> > > >>>> > > Kamal Chandraprakash <[email protected]> 于2025年12月2日周二 >>>> > > 17:41写道: >>>> > > >>>> > >> 1. Do you also have to update the RemoteCopy lag segments and bytes >>>> > >> metric? >>>> > >> 2. As Haiying mentioned, the segments get eventually uploaded to >>>> remote >>>> > so >>>> > >> not sure about the >>>> > >> benefit of this proposal. And, remote storage cost is considered >>>> as low >>>> > >> when compared to broker local-disk. >>>> > >> It provides some cushion during third-party object storage >>>> downtime. >>>> > >> >>>> > >> >>>> > >> On Tue, Dec 2, 2025 at 2:45 PM Kamal Chandraprakash < >>>> > >> [email protected]> wrote: >>>> > >> >>>> > >> > Hi Jian, >>>> > >> > >>>> > >> > Thanks for the KIP! >>>> > >> > >>>> > >> > When remote storage is unavailable for a few hrs, then with lazy >>>> > upload >>>> > >> > there is a risk of the broker disk getting full soon. >>>> > >> > The Admin has to configure the local retention configs >>>> properly. With >>>> > >> > eager upload, the disk utilization won't grow >>>> > >> > until the local retention time (expectation is that all the >>>> > >> > passive segments are uploaded). And, provides some time >>>> > >> > for the Admin to take any action based on the situation. >>>> > >> > >>>> > >> > -- >>>> > >> > Kamal >>>> > >> > >>>> > >> > On Tue, Dec 2, 2025 at 10:28 AM Haiying Cai via dev < >>>> > >> [email protected]> >>>> > >> > wrote: >>>> > >> > >>>> > >> >> Jian, >>>> > >> >> >>>> > >> >> Understands this is an optional feature and the cost saving >>>> depends >>>> > on >>>> > >> >> the ratio between local.retention.ms and total retention.ms. >>>> > >> >> >>>> > >> >> In our setup, we have local.retention set to 3 hours and total >>>> > >> retention >>>> > >> >> set to 3 days, so the saving is not going to be significant. >>>> > >> >> >>>> > >> >> On 2025/12/01 05:33:11 jian fu wrote: >>>> > >> >> > Hi Haiying Cai, >>>> > >> >> > >>>> > >> >> > Thanks for joining the discussion for this KIP. All of your >>>> > concerns >>>> > >> are >>>> > >> >> > valid, and that is exactly why I introduced a topic-level >>>> > >> configuration >>>> > >> >> to >>>> > >> >> > make this feature optional. This means that, by default, the >>>> > behavior >>>> > >> >> > remains unchanged. Only when users are not pursuing faster >>>> broker >>>> > >> boot >>>> > >> >> time >>>> > >> >> > or other optimizations — and care more about cost — would they >>>> > enable >>>> > >> >> this >>>> > >> >> > option to some topics to save resources. >>>> > >> >> > >>>> > >> >> > Regarding cost self: the actual savings depend on the ratio >>>> between >>>> > >> >> local >>>> > >> >> > retention and remote retention. In the KIP/PR, I provided a >>>> test >>>> > >> >> example: >>>> > >> >> > if we configure 1 day of local retention and 2 days of remote >>>> > >> >> retention, we >>>> > >> >> > can save about 50%. And realistically, I don't think anyone >>>> would >>>> > >> boldly >>>> > >> >> > set local retention to a very small value (such as minutes) >>>> due to >>>> > >> the >>>> > >> >> > latency concerns associated with remote storage. So in short, >>>> the >>>> > >> >> feature >>>> > >> >> > will help reduce cost, and the amount saved simply depends on >>>> the >>>> > >> ratio. >>>> > >> >> > Take my company's usage as real example, we configure most of >>>> the >>>> > >> >> topics: 1 >>>> > >> >> > day of local retention and 3–7 days of remote storage (3 days >>>> for >>>> > >> topic >>>> > >> >> > with log/metric usage, 7 days for topic with normal business >>>> > usage). >>>> > >> >> and we >>>> > >> >> > don't care about the boot speed and some thing else, This KIP >>>> > allows >>>> > >> us >>>> > >> >> to >>>> > >> >> > save 1/7 to 1/3 of the total disk usage for remote storage. >>>> > >> >> > >>>> > >> >> > Anyway, this is just a topic-level optional feature which >>>> don't >>>> > >> reject >>>> > >> >> the >>>> > >> >> > benifit for current design. Thanks again for the discussion. >>>> I can >>>> > >> >> update >>>> > >> >> > the KIP to better classify scenarios where this optional >>>> feature is >>>> > >> not >>>> > >> >> > suitable. Currently, I only listed real-time analytics as the >>>> > >> negative >>>> > >> >> > example. >>>> > >> >> > >>>> > >> >> > Welcome further discussion to help make this KIP more >>>> complete. >>>> > >> Thanks! >>>> > >> >> > >>>> > >> >> > Regards, >>>> > >> >> > Jian >>>> > >> >> > >>>> > >> >> > Haiying Cai via dev <[email protected]> 于2025年12月1日周一 >>>> > 12:40写道: >>>> > >> >> > >>>> > >> >> > > Jian, >>>> > >> >> > > >>>> > >> >> > > Thanks for the contribution. But I feel the uploading the >>>> local >>>> > >> >> segment >>>> > >> >> > > file to remote storage ASAP is advantageous in several >>>> scenarios: >>>> > >> >> > > >>>> > >> >> > > 1. Enable the fast bootstrapping a new broker. A new broker >>>> > >> doesn’t >>>> > >> >> have >>>> > >> >> > > to replicate all the data from the leader broker, it only >>>> needs >>>> > to >>>> > >> >> > > replicate the data from the tail of the remote log segment >>>> to the >>>> > >> >> tail of >>>> > >> >> > > the current end of the topic (LSO) since all the other data >>>> are >>>> > in >>>> > >> the >>>> > >> >> > > remote tiered storage and it can download them later >>>> lazily, this >>>> > >> is >>>> > >> >> what >>>> > >> >> > > KIP-1023 trying to solve; >>>> > >> >> > > 2. Although nobody has proposed a KIP to allow a consumer >>>> client >>>> > to >>>> > >> >> read >>>> > >> >> > > from the remote tiered storage directly, but this will >>>> helps the >>>> > >> >> > > fall-behind consumer to do catch-up reads or perform the >>>> > backfill. >>>> > >> >> This >>>> > >> >> > > path allows the consumer backfill to finish without >>>> polluting the >>>> > >> >> broker’s >>>> > >> >> > > page cache. The earlier the data is on the remote tiered >>>> > storage, >>>> > >> >> the more >>>> > >> >> > > advantageous it is for the client. >>>> > >> >> > > >>>> > >> >> > > I think in your Proposal, you are delaying uploading the >>>> segment >>>> > >> but >>>> > >> >> the >>>> > >> >> > > file will still be uploaded at a later time, I guess this >>>> can >>>> > >> saves a >>>> > >> >> few >>>> > >> >> > > hours storage cost for that file in the remote storage, not >>>> sure >>>> > >> >> whether >>>> > >> >> > > that is a significant cost saved (if the file needs to stay >>>> in >>>> > >> remote >>>> > >> >> > > tiered storage for several days or weeks due to retention >>>> > policy). >>>> > >> >> > > >>>> > >> >> > > On 2025/11/19 13:29:11 jian fu wrote: >>>> > >> >> > > > Hi everyone, I'd like to start a discussion on KIP-1241, >>>> the >>>> > goal >>>> > >> >> is to >>>> > >> >> > > > reduce the remote storage. KIP: >>>> > >> >> > > > >>>> > >> >> > > >>>> > >> >> >>>> > >> >>>> > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1241%3A+Reduce+tiered+storage+redundancy+with+delayed+upload >>>> > >> >> > > > >>>> > >> >> > > > The Draft PR: >>>> https://github.com/apache/kafka/pull/20913 >>>> > >> >> Problem: >>>> > >> >> > > > Currently, >>>> > >> >> > > > Kafka's tiered storage implementation uploads all >>>> non-active >>>> > >> local >>>> > >> >> log >>>> > >> >> > > > segments to remote storage immediately, even when they are >>>> > still >>>> > >> >> within >>>> > >> >> > > the >>>> > >> >> > > > local retention period. >>>> > >> >> > > > This results in redundant storage of the same data in both >>>> > local >>>> > >> and >>>> > >> >> > > remote >>>> > >> >> > > > tiers. >>>> > >> >> > > > >>>> > >> >> > > > When there is no requirement for real-time analytics or >>>> > immediate >>>> > >> >> > > > consumption based on remote storage. It has the following >>>> > >> drawbacks: >>>> > >> >> > > > >>>> > >> >> > > > 1. Wastes storage capacity and costs: The same data is >>>> stored >>>> > >> twice >>>> > >> >> > > during >>>> > >> >> > > > the local retention window >>>> > >> >> > > > 2. Provides no immediate benefit: During the local >>>> retention >>>> > >> period, >>>> > >> >> > > reads >>>> > >> >> > > > prioritize local data, making the remote copy unnecessary >>>> > >> >> > > > >>>> > >> >> > > > >>>> > >> >> > > > So. this KIP is to reduce tiered storage redundancy with >>>> > delayed >>>> > >> >> upload. >>>> > >> >> > > > You can check the test result example here directly: >>>> > >> >> > > > >>>> > >> https://github.com/apache/kafka/pull/20913#issuecomment-3547156286 >>>> > >> >> > > > Looking forward to your feedback! Best regards, Jian >>>> > >> >> > > > >>>> > >> >> > >>>> > >> > >>>> > >> > >>>> > >> >>>> > > >>>> > > >>>> > > >>>> > > >>>> > >>>> >>> >>> >>> >>> >>> >> >> >> > > -- > Regards > > Fu.Jian > > >
