Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

Zheng Yu Chen Thu, 18 Aug 2022 03:03:44 -0700

Zheng Yu Chen
18:00 (2分钟前)
发送至 dev
Thank you for your interest in this program. This proposal only takes
effect for Session Cluster Mode, and will not take effect for Application
Mode. I would like to answer some of your previous questions about patterns
and practical cases first, and then discuss some implementation details
later.
On a SessionCluster cluster in production, I run dozens or hundreds of
jobs, which are not short-running jobs, but long-running jobs. Why they
don't choose Applicantion Mode
   ○ This can save some JVM resources and reduce server costs
   ○ More adequate resource utilization
   ○ Starting Application Mode has a long resource application and waiting
(because SessionCluster has already applied for fixed TM and JM resources
at startup)
But we will also find the following potential problems
   ○ Poor isolation between JobMaster threads in JobManager: When there are
too many jobs, the JobManager is under great pressure. For example, when
running more than 20 jobs, because the JobMaster thread is inside the
JobManager process, it is equivalent to having more than 20 threads in one
maintenance The thread pools compete with each other for resources to
restore or compile JobGraph, etc.
   ○ JobManager's functional responsibilities are too large: With the
development of Flink, there will inevitably be more rich functions running
on JobMaster. As the title says, when in SessionCluster mode, should we
consider JobManager as a routing-like forwarding function , instead of a
component that integrates huge functions, proper splitting reduces the
workload of the JobManager
Based on the above situation, so there is the idea of the title. Can we
introduce the JobMasterContainer component to reduce the impact between
jobs, so that jobs can run more gracefully and for a long time on the
SessionCluster instead of just running rather small/short-lived jobs (e.g.
FlinkSQL queries) or when deploying some kind
of dev environment for testing out job implementations


What do you think ~

Matthias Pohl <matthias.p...@aiven.io.invalid> 于2022年8月17日周三 22:22写道：

> Hi Conrad,
> thanks for reaching out to the community with your proposal. I looked
> through FLIP-257 [1]. Your motivation sounds interesting. Can you elaborate
> a bit more on the concrete use-cases you have in mind? How do these match
> the user-cases of the two favored execution modes, i.e. Flink's Session and
> Application mode?
>
> As mentioned in [2], Application Mode should be preferred for single
> long-running jobs to isolate the resources of each of those jobs from each
> other. In contrast, Session Mode is the natural choice when running rather
> small/short-lived jobs (e.g. FlinkSQL queries) or when deploying some kind
> of dev environment for testing out job implementations. It feels like your
> use-case is somewhere in between a bit? It would be interesting to get a
> better understanding of where you're coming from. Maybe, you could provide
> some workload statistics?
>
> That considered, I guess it's a topic worth looking into. Here are a few
> thoughts after looking into FLIP-257:
> - As far as I can see, the BlobServer is used for sharing configuration
> information (e.g. Classpath information) as part of the ExecutionGraph
> instantiation [3]. The JobGraph is not persisted through the BlobServer but
> rather stored in the JobGraphStore backed by the HighAvailabilityServices
> implementation. The HA side is not really covered in FLIP-257, yet.
> - The approach of having the current Dispatcher living next to the new
> JobMasterDispatcher (that encapsulates the logic around distributing the
> workload onto multiple runners) leaves me with some doubt whether there
> wouldn't be a better separation of concerns here. What about leaving the
> Dispatcher as is but adding some abstraction between
> JobManagerRunner/JobMaster and the Dispatcher that hides the logic around
> whether these instances are "deployed" on the same machine or somewhere
> else.
> - About distributing JobManager workload: The JobManager already utilizes
> leader election for faster recovery. Hence, one can set up multiple
> JobManagers in idle mode which wait to gain leadership and pick up the work
> (i.e. recovering the jobs) of the previously failed JobManager leader. What
> about utilizing this setup: The currently idling JobManagers could be
> utilized to take over some of the workload from the leader. I haven't
> thought this through, yet. But I'm wondering whether that would be a path
> we could go down. This would enable us to still stick to the
> JobManager/TaskManager setup which users are already used to rather than
> introducing another type of cluster node.
> - The JobManager initialization logic is kind of tricky to get your head
> around. There is some overhead, I hope, we could clean up as part of the
> efforts of removing the per-Job Mode from Flink [4]. It was decided to
> deprecate per-Job Mode in Flink 1.15. But we have to stick with it for some
> time (i.e. it's not going to be removed in 1.16) since it's a quite basic
> feature users might rely on. This shouldn't be a blocker. I just wanted to
> mention it to have it in the back of our minds when looking into ways to
> come up with a solid proposal for FLIP-257.
> - My concern is that this FLIP might turn out to be larger than expected
> and that it might be worth cutting it down into smaller chunks with each
> being covered in a separate FLIP down the road if we have some agreement
> and a clearer picture on how this should be tackled.
>
> I'm gonna add Chesnay and David to this discussion.
>
> Best,
> Matthias
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#deployment-modes
> [3]
>
> https://github.com/apache/flink/blob/9ed70a1e8b5d59abdf9d7673bc5b44d421140ef0/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/DefaultExecutionGraph.java#L333
> [4] https://lists.apache.org/thread/b8g76cqgtr2c515rd1bs41vy285f317n
>
>
> On Tue, Aug 16, 2022 at 11:43 AM Zheng Yu Chen <jam.gz...@gmail.com>
> wrote:
>
> > Hi community ~
> >
> > I think this title should be quite interesting. The idea is to reduce the
> > workload of the JobManager and make the SessionCluster [2] more stable in
> > the process of running jobs. I designed a plan for splitting the
> JobManager
> > on FLIP-257 [1]:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+JobMaster+Thread+Split+to+Process
> > >
> >
> > This proposal proposes a splitting scheme for the current process and a
> new
> > process implementation idea that is compatible with the original process
> > model: splitting the internal JobMaster component of the JobManager, and
> > controlling whether to enable this new process through a parameter In the
> > split scheme, when the user configures, the JobMaster will make it run as
> > an independent service, reducing the workload of the JobManager. By
> > implementing a new Dispatcher to communicate and interact with a single
> > split JobMaster or multiple JobMasters, to achieve job management
> >
> > The main features that it provides is:
> >
> >    - After the user submits the job, the JobMaster thread was split into
> >    other processes to run in the past. They no longer run in the
> > JobManager,
> >    but in other processes.
> >    - Users can deploy multiple components mentioned above, which run
> >    multiple JobMaster threads, thereby reducing the workload of the
> > JobManager
> >
> > Some of the challenging use cases that these features solve are:
> >
> >    - Compatible with the original job running mode (run JobMaster Thread
> on
> >    JobManager)
> >    - Implement a new Dispatcher that forwards client operations related
> to
> >    jobs
> >
> >
> >  I would love to hear and address your thoughts and feedback , and if
> > possible drive a FLIP-257 ！
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+JobMaster+Thread+Split+to+Process
> > >
> >
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/overview/#session-mode
> >
> >
> > --
> >
> > Have a nice day ~
> >
> > ConradJam
> >
>

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

Reply via email to