Re: [DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Venkatakrishnan Sowrirajan Mon, 12 May 2025 11:02:46 -0700

> Regarding decoupling the two features, would your suggestion be to
separate
them into two separate FLIPs?


Sorry for the late response.

Yes, that is correct. If these 2 features are somewhat coupled with each
other, then it makes sense to address it in the same FLIP otherwise I think
it will be better to tackle it as 2 different FLIPs.

Regards
Venkata krishnan


On Mon, Mar 3, 2025 at 1:42 PM Allison <[email protected]> wrote:

> Hi Yanquan,
>
> I've updated the FLIP to contain the default values, thanks for your help!
>
> Sincerely
> - Allison
>
> On Thu, Jan 30, 2025 at 3:21 AM Yanquan Lv <[email protected]> wrote:
>
> > Thank you for your explanation. I have basically solved the previous
> > questions.
> >
> > Regarding the second point, I would like to suggest clarifying the
> default
> > values for newly adding parameters in `Public Interfaces` session.
> >
> > ---------- Forwarded message ---------
> > 发件人： Allison <[email protected]>
> > Date: 2025年1月30日周四 上午3:42
> > Subject: Re: [DISCUSS] FLIP-505: Flink History Server Scability
> > Improvements, Remote Data Store Fetch and Per Job Fetch
> > To: <[email protected]>
> >
> >
> > Hi Yanquan,
> >
> > Thanks for taking a look at this. Re: your questions:
> >
> > 1. Yes, I've updated the FLIP to be more clear, but it involves modifying
> > the existing configuration of historyserver.archive.retained-jobs to
> > historyserver.archive.cached-retained-jobs. The number of remote-jobs
> > stored can be infinite, the thought behind this is that the remote data
> > storage can be cleaned up or limited by a separate protocol that can be
> > customized to each individual use case.
> > 2. Could you clarify this a bit? I'm not sure I understand this part, do
> > you mean to add what the configurations would be set to in the case of
> them
> > not being defined to the FLIP?
> > 3. historyserver.archive.fs.refresh-interval is the time duration
> between a
> > call to the remote data storage to find fresh data. What it configures is
> > how often the FHS polls the remote data store for new files. The remote
> > data store is written to whenever a job is finished.
> >
> > Hope this clarifies some things.
> >
> > Best,
> > - Allison
> >
> >
> > On Mon, Jan 27, 2025 at 7:10 PM Yanquan Lv <[email protected]> wrote:
> >
> > > Hi, Allison. Thanks for driving this FLIP.
> > > I have some questions to confirm:
> > >
> > > 1. I can’t find any existed configuration name
> > > `historyserver.archive.cached-retained-jobs`, I guess that what you
> mean
> > is
> > > modifing existing configuration from
> > `historyserver.archive.retained-jobs`
> > > to `historyserver.archive.cached-retained-jobs`. If so, If we only
> limit
> > > the number of retained-jobs stored locally, is the number of
> > retained-jobs
> > > stored remotely infinite?
> > > 2. I think it would be better to provide instructions for adding
> default
> > > values to HistoryServerOptions.
> > > 3. Does `historyserver.archive.fs.refresh-interval` apply to both local
> > and
> > > remote storage simultaneously?
> > >
> > > Best,
> > > Yanquan
> > >
> > > Allison <[email protected]> 于 2025年1月17日周五 上午8:07写道：
> > >
> > > > Hi everyone,
> > > >
> > > > I would like to initiate a discussion for the FLIP below, which
> > enhances
> > > to
> > > > the Flink History Server to allow greater scalability of the service.
> > > >
> > > > Motivation:
> > > >
> > > > Currently, the Flink History Server (FHS) is limited in the number of
> > job
> > > > archives it can serve based on the storage capacity of the node that
> > the
> > > > FHS runs in. Job archives are stored locally in a cache which
> creates a
> > > > local directory which is expanded out based on the contents of a
> single
> > > > json archive file. This not only uses up local memory space, but also
> > > > because of how the FHS expands the job archives into a nested
> directory
> > > > structure, for jobs with a large number of taskmanagers or subtasks,
> > > inode
> > > > space often runs out.  In order to make the FHS more performant, we
> > would
> > > > like to introduce the ability to decouple the job archive storage for
> > the
> > > > FHS from being limited to the local cache, to being able to store and
> > > fetch
> > > > jobs archives from a remote file store.
> > > >
> > > > FLIP proposal document:
> > > >
> > > >
> > >
> >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP*505*3A*Flink*History*Server*Scability*Improvements*2C*Remote*Data*Store*Fetch*and*Per*Job*Fetch__;KyUrKysrKyUrKysrKysrKw!!IKRxdwAv5BmarQ!cy7YUT3RVhkz3ixGuldCgf5lTCb3IMzUuAUClyB3qRuI0vAjYfvNVmw2NOggm06YnRGkmQ-3hMpOp0Ot7yRPK54$
> > > >
> > > > Thanks!
> > > >
> > > > Best,
> > > > - Allison Chang
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Reply via email to