Hi Konstantin, Thanks a lot for starting this discussion! I hope my thoughts and experiences why users use Per-Job Mode, especially in YARN can help: #1. Per-job mode makes managing dependencies easier: I have met some customers who used Per-Job Mode to submit jobs with a lot of local user-defined jars by using '-C' option directly. They do not need to upload these jars to some remote file system(e.g. HDFS) first, which makes their life easier. #2. In YARN mode, currently, there are some limitations of Application Mode: in this jira(https://issues.apache.org/jira/browse/FLINK-24897) that I am working on, we find that YARN Application Mode do not support `usrlib` very well, which makes it hard to use FlinkUserCodeClassLoader to load classes in user-defined jars.
I believe above 2 points, especially #2, can be reassured as we enhance the YARN Application Mode later but I think it is worthwhile to consider dependency management more carefully before we make decisions. Best, Biao Geng Konstantin Knauf <kna...@apache.org> 于2022年1月13日周四 16:32写道: > Hi everyone, > > I would like to discuss and understand if the benefits of having Per-Job > Mode in Apache Flink outweigh its drawbacks. > > > *# Background: Flink's Deployment Modes* > Flink currently has three deployment modes. They differ in the following > dimensions: > * main() method executed on Jobmanager or Client > * dependencies shipped by client or bundled with all nodes > * number of jobs per cluster & relationship between job and cluster > lifecycle* (supported resource providers) > > ## Application Mode > * main() method executed on Jobmanager > * dependencies already need to be available on all nodes > * dedicated cluster for all jobs executed from the same main()-method > (Note: applications with more than one job, currently still significant > limitations like missing high-availability). Technically, a session cluster > dedicated to all jobs submitted from the same main() method. > * supported by standalone, native kubernetes, YARN > > ## Session Mode > * main() method executed in client > * dependencies are distributed from and by the client to all nodes > * cluster is shared by multiple jobs submitted from different clients, > independent lifecycle > * supported by standalone, Native Kubernetes, YARN > > ## Per-Job Mode > * main() method executed in client > * dependencies are distributed from and by the client to all nodes > * dedicated cluster for a single job > * supported by YARN only > > > *# Reasons to Keep** There are use cases where you might need the > combination of a single job per cluster, but main() method execution in the > client. This combination is only supported by per-job mode. > * It currently exists. Existing users will need to migrate to either > session or application mode. > > > *# Reasons to Drop** With Per-Job Mode and Application Mode we have two > modes that for most users probably do the same thing. Specifically, for > those users that don't care where the main() method is executed and want to > submit a single job per cluster. Having two ways to do the same thing is > confusing. > * Per-Job Mode is only supported by YARN anyway. If we keep it, we should > work towards support in Kubernetes and Standalone, too, to reduce special > casing. > * Dropping per-job mode would reduce complexity in the code and allow us > to dedicate more resources to the other two deployment modes. > * I believe with session mode and application mode we have to easily > distinguishable and understandable deployment modes that cover Flink's use > cases: > * session mode: olap-style, interactive jobs/queries, short lived batch > jobs, very small jobs, traditional cluster-centric deployment mode (fits > the "Hadoop world") > * application mode: long-running streaming jobs, large scale & > heterogenous jobs (resource isolation!), application-centric deployment > mode (fits the "Kubernetes world") > > > *# Call to Action* > * Do you use per-job mode? If so, why & would you be able to migrate to > one of the other methods? > * Am I missing any pros/cons? > * Are you in favor of dropping per-job mode midterm? > > Cheers and thank you, > > Konstantin > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >