Re: [DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Yanquan Lv Thu, 30 Jan 2025 03:21:20 -0800

Thank you for your explanation. I have basically solved the previous
questions.

Regarding the second point, I would like to suggest clarifying the default
values for newly adding parameters in `Public Interfaces` session.

---------- Forwarded message ---------
发件人： Allison <achang5...@gmail.com>
Date: 2025年1月30日周四 上午3:42
Subject: Re: [DISCUSS] FLIP-505: Flink History Server Scability
Improvements, Remote Data Store Fetch and Per Job Fetch
To: <dev@flink.apache.org>

Hi Yanquan,

Thanks for taking a look at this. Re: your questions:

1. Yes, I've updated the FLIP to be more clear, but it involves modifying
the existing configuration of historyserver.archive.retained-jobs to
historyserver.archive.cached-retained-jobs. The number of remote-jobs
stored can be infinite, the thought behind this is that the remote data
storage can be cleaned up or limited by a separate protocol that can be
customized to each individual use case.
2. Could you clarify this a bit? I'm not sure I understand this part, do
you mean to add what the configurations would be set to in the case of them
not being defined to the FLIP?
3. historyserver.archive.fs.refresh-interval is the time duration between a
call to the remote data storage to find fresh data. What it configures is
how often the FHS polls the remote data store for new files. The remote
data store is written to whenever a job is finished.

Hope this clarifies some things.

Best,
- Allison

On Mon, Jan 27, 2025 at 7:10 PM Yanquan Lv <decq12y...@gmail.com> wrote:

> Hi, Allison. Thanks for driving this FLIP.
> I have some questions to confirm:
>
> 1. I can’t find any existed configuration name
> `historyserver.archive.cached-retained-jobs`, I guess that what you mean
is
> modifing existing configuration from `historyserver.archive.retained-jobs`
> to `historyserver.archive.cached-retained-jobs`. If so, If we only limit
> the number of retained-jobs stored locally, is the number of retained-jobs
> stored remotely infinite?
> 2. I think it would be better to provide instructions for adding default
> values to HistoryServerOptions.
> 3. Does `historyserver.archive.fs.refresh-interval` apply to both local
and
> remote storage simultaneously?
>
> Best,
> Yanquan
>
> Allison <achang5...@gmail.com> 于 2025年1月17日周五 上午8:07写道：
>
> > Hi everyone,
> >
> > I would like to initiate a discussion for the FLIP below, which enhances
> to
> > the Flink History Server to allow greater scalability of the service.
> >
> > Motivation:
> >
> > Currently, the Flink History Server (FHS) is limited in the number of
job
> > archives it can serve based on the storage capacity of the node that the
> > FHS runs in. Job archives are stored locally in a cache which creates a
> > local directory which is expanded out based on the contents of a single
> > json archive file. This not only uses up local memory space, but also
> > because of how the FHS expands the job archives into a nested directory
> > structure, for jobs with a large number of taskmanagers or subtasks,
> inode
> > space often runs out.  In order to make the FHS more performant, we
would
> > like to introduce the ability to decouple the job archive storage for
the
> > FHS from being limited to the local cache, to being able to store and
> fetch
> > jobs archives from a remote file store.
> >
> > FLIP proposal document:
> >
> >
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
> >
> > Thanks!
> >
> > Best,
> > - Allison Chang
> >
>

Re: [DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Reply via email to