Hi Yuepeng, Looks like this work can have some symbiosis with the change that I've proposed here in FLIP-505. This addresses the question that Ryan asked about whether or not remotely stored job archives will be impacted if the retention is changed. Feel free to take a look at the FLIP as well as the PR for FLIP-505. Looks like we have the opportunity to significantly improve the History server with these two changes.
FLIP-505: https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch PR: https://github.com/apache/flink/pull/26878 Best, Allison On Thu, Aug 14, 2025 at 9:51 AM Yuepeng Pan <panyuep...@apache.org> wrote: > Hi, Ryan van Huuksloot. > > > Might be worth stating that explicitly in the FLIP. > Nice idea~ The sub-section added here[1] to clarify the item. > > Thanks a lot ! > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer > -Thetimingtocheckwhethertargetfileshaveexceededtheretentionthresholds > > Best, > Yuepeng Pan > > On 2025/08/14 16:27:39 Ryan van Huuksloot wrote: > > That sounds like a good option. > > > > Might be worth stating that explicitly in the FLIP. > > > > No other questions from me - will be a nice extension! > > > > Ryan van Huuksloot > > Staff Engineer, Infrastructure | Streaming Platform > > [image: Shopify] > > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email > > > > > > > > On Thu, Aug 14, 2025 at 12:22 PM Yuepeng Pan <panyuep...@apache.org> > wrote: > > > > > Hi, Hi, Ryan van Huuksloot. > > > > > > >Are you planning on having a thread to check for TTL? Or what is the > plan > > > >for TTL? > > > >The quantity based would have a check when a new job is archived? > > > > > > Just like the implementation in the POC[1], if we continue following > the > > > process where > > > HistoryServer#start method periodically invokes > > > HistoryServerArchiveFetcher#fetchArchives > > > based on 'historyserver.archive.fs.refresh-interval' to check > > > whether target files should be retained, what do you think about it ? > > > Of course, I'm very open to hearing about other potentially better > > > implementation approaches. > > > Please let me know what's your opinion. > > > Thank you. > > > > > > [1] https://github.com/apache/flink/pull/26902 > > > > > > Best, > > > Yuepeng Pan > > > > > > > > > On 2025/08/14 16:07:10 Ryan van Huuksloot wrote: > > > > Thanks, sounds good. > > > > > > > > Are you planning on having a thread to check for TTL? Or what is the > plan > > > > for TTL? > > > > The quantity based would have a check when a new job is archived? > > > > > > > > Ryan van Huuksloot > > > > Staff Engineer, Infrastructure | Streaming Platform > > > > [image: Shopify] > > > > < > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email > > > > > > > > > > > > > > > > On Thu, Aug 14, 2025 at 12:04 PM Yuepeng Pan <panyuep...@apache.org> > > > wrote: > > > > > > > > > Hi, Ryan van Huuksloot. > > > > > > > > > > Thank you very much for your reply. > Question: Is the History > Server > > > then > > > > > going to delete the files stored? > (i.e. we use GCS, would it > delete > > > the > > > > > files there as well?) > Or is this strictly what is shown in the > UI? > > > > > > > > > > > > > > > > > > > > > > > > > Yes, this feature introduced in the FLIP is a super-set of the > original > > > > > feature that is controlled by > 'historyserver.archive.retained-jobs'. > > > > > > > > > > So if I understand correctly, after the new feature is introduced, > it > > > > > would affect the retention period of remote distributed storage > jobs > > > > > history files as well, not only for what is shown in the UI. > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > Yuepeng Pan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2025-08-14 23:34:54, "Ryan van Huuksloot" > > > > > <ryan.vanhuuksl...@shopify.com.INVALID> wrote: > > > > > >I took a look. Overall it would be nice to have more ways to > > > configure the > > > > > >History Server. > > > > > > > > > > > >Question: Is the History Server then going to delete the files > stored? > > > > > >(i.e. we use GCS, would it delete the files there as well?) > > > > > >Or is this strictly what is shown in the UI? > > > > > > > > > > > >Ryan van Huuksloot > > > > > >Staff Engineer, Infrastructure | Streaming Platform > > > > > >[image: Shopify] > > > > > >< > > > > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> > > > > > > > > > > > > > > > > > >On Thu, Aug 14, 2025 at 11:17 AM Yuepeng Pan < > panyuep...@apache.org> > > > > > wrote: > > > > > > > > > > > >> Bumping this thread. Thanks! > > > > > >> > > > > > >> Best, > > > > > >> Yuepeng Pan > > > > > >> > > > > > >> On 2025/08/11 03:49:27 Yuepeng Pan wrote: > > > > > >> > Hi community, > > > > > >> > > > > > > >> > > > > > > >> > Currently, HistoryServer supports only a quantity-based job > > > archive > > > > > >> retention policy [1]. > > > > > >> > This is insufficient for scenarios such as: > > > > > >> > - Time-based retention (e.g., last X days). > > > > > >> > - Combined rules (e.g., within 7 days AND ≤100 jobs). > > > > > >> > > > > > > >> > > > > > > >> > To address these limitations, I’d like to start a discussion > on > > > > > FLIP-490 > > > > > >> [2], > > > > > >> > which proposes a more flexible job archive retention mechanism > > > that > > > > > >> supports time-based, quantity-based, and composite strategies > (with > > > > > AND/OR > > > > > >> logic). > > > > > >> > > > > > > >> > > > > > > >> > Looking forward to your feedback. > > > > > >> > > > > > > >> > > > > > > >> > Best, > > > > > >> > Yuepeng Pan > > > > > >> > > > > > > >> > > > > > > >> > [1] > > > > > >> > > > > > > > > > https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java#L241 > > > > > >> > [2] > > > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857 > > > > > >> > > > > > > > > > > > > > > >