Hi, Allison. Thanks for driving this FLIP. I have some questions to confirm:
1. I can’t find any existed configuration name `historyserver.archive.cached-retained-jobs`, I guess that what you mean is modifing existing configuration from `historyserver.archive.retained-jobs` to `historyserver.archive.cached-retained-jobs`. If so, If we only limit the number of retained-jobs stored locally, is the number of retained-jobs stored remotely infinite? 2. I think it would be better to provide instructions for adding default values to HistoryServerOptions. 3. Does `historyserver.archive.fs.refresh-interval` apply to both local and remote storage simultaneously? Best, Yanquan Allison <achang5...@gmail.com> 于 2025年1月17日周五 上午8:07写道: > Hi everyone, > > I would like to initiate a discussion for the FLIP below, which enhances to > the Flink History Server to allow greater scalability of the service. > > Motivation: > > Currently, the Flink History Server (FHS) is limited in the number of job > archives it can serve based on the storage capacity of the node that the > FHS runs in. Job archives are stored locally in a cache which creates a > local directory which is expanded out based on the contents of a single > json archive file. This not only uses up local memory space, but also > because of how the FHS expands the job archives into a nested directory > structure, for jobs with a large number of taskmanagers or subtasks, inode > space often runs out. In order to make the FHS more performant, we would > like to introduce the ability to decouple the job archive storage for the > FHS from being limited to the local cache, to being able to store and fetch > jobs archives from a remote file store. > > FLIP proposal document: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch > > Thanks! > > Best, > - Allison Chang >