I am not sure whether the sql script could also be submitted like python. We will need a sql-runner jar, which plays as the user jar and has the sql script as the argument.
./bin/flink run-application \ --target kubernetes-application \ -Dkubernetes.cluster-id=<ClusterId> \ -Dkubernetes.container.image=<FlinkImageName> \ --sqlFiles /opt/flink/examples/sql/word_count.sql Best, Yang Jark Wu <imj...@gmail.com> 于2022年2月16日周三 20:00写道: > I think this mode is still limited and maybe not easy to extend. > Could the application mode provide an interface to execute? > So that clients can implement the interface and pass arbitrary parameters > (e.g. SQL scripts) ? > > Best, > Jark > > On Wed, 16 Feb 2022 at 18:54, Konstantin Knauf <kna...@apache.org> wrote: > > > Hi Jark, > > > > I think you are raising a very good point. I think we need an application > > mode for SQL that would work along the lines of executing a SQL script > > (incl. init scripts) located in a particular directory in the Docker > Image. > > Details to be discussed. > > > > Do you think Zeppelin/SQL CLI could work with such a mode for > > non-interactive queries (interactive queries would use a session > cluster)? > > > > Best, > > > > Konstantin > > > > > > On Sat, Feb 12, 2022 at 4:31 AM Jark Wu <imj...@gmail.com> wrote: > > > > > Hi David, > > > > > > Zeppelin and SQL CLI also support submitting long-running streaming SQL > > > jobs. So the session cluster is not a fit mode. > > > > > > Best, > > > Jark > > > > > > On Fri, 11 Feb 2022 at 22:42, David Morávek <d...@apache.org> wrote: > > > > > > > Hi Jark, can you please elaborate about the current need of the > per-job > > > > mode for interactive clients (eg. Zeppelin that you've mentioned)? > > Aren't > > > > these a natural fit for the session cluster? > > > > > > > > D. > > > > > > > > On Fri, Feb 11, 2022 at 3:25 PM Jark Wu <imj...@gmail.com> wrote: > > > > > > > > > Hi Konstantin, > > > > > > > > > > I'm not very familiar with the implementation of per-job mode and > > > > > application mode. > > > > > But is there any instruction for users abou how to migrate > > > platforms/jobs > > > > > to application mode? > > > > > IIUC, the biggest difference between the two modes is where the > > main() > > > > > method is executed. > > > > > However, SQL jobs are not jar applications and don't have the > main() > > > > > method. > > > > > For example, SQL CLI submits SQL jobs by invoking > > > > > `StreamExecutionEnvironment#executeAsync(StreamGraph)`. > > > > > How SQL Client and SQL platforms (e.g. Zeppelin) support > application > > > > mode? > > > > > > > > > > Best, > > > > > Jark > > > > > > > > > > > > > > > On Fri, 28 Jan 2022 at 23:33, Konstantin Knauf <kna...@apache.org> > > > > wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > Thank you for sharing your perspectives. I was not aware of > > > > > > these limitations of per-job mode on YARN. It seems that there > is a > > > > > general > > > > > > agreement to deprecate per-job mode and to drop it once the > > > limitations > > > > > > around YARN are resolved. I've started a corresponding vote in > [1]. > > > > > > > > > > > > Thanks again, > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > [1] > > https://lists.apache.org/thread/v6oz92dfp95qcox45l0f8393089oyjv4 > > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:53 PM Ferenc Csaky > > > > <ferenc.cs...@pm.me.invalid > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Yang, > > > > > > > > > > > > > > Thank you for the clarification. In general I think we will > have > > > time > > > > > to > > > > > > > experiment with this until it will be removed totally and > migrate > > > our > > > > > > > solution to use application mode. > > > > > > > > > > > > > > Regards, > > > > > > > F > > > > > > > > > > > > > > On 2022/01/26 02:42:24 Yang Wang wrote: > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I remember the application mode was initially named "cluster > > > mode". > > > > > As > > > > > > a > > > > > > > > contrast, the per-job mode is the "client mode". > > > > > > > > So I believe application mode should cover all the > > > functionalities > > > > of > > > > > > > > per-job except where we are running the user main code. > > > > > > > > In the containerized or the Kubernetes world, the application > > > mode > > > > is > > > > > > > more > > > > > > > > native and easy to use since all the Flink and user > > > > > > > > jars are bundled in the image. I am also in favor of > > deprecating > > > > and > > > > > > > > removing the per-job in the long run. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Ferenc > > > > > > > > IIRC, the YARN application mode could ship user jars and > > > > dependencies > > > > > > via > > > > > > > > "yarn.ship-files" config option. The only > > > > > > > > limitation is that we could not ship and load the user > > > dependencies > > > > > > with > > > > > > > > user classloader, not the parent classloader. > > > > > > > > FLINK-24897 is trying to fix this via supporting "usrlib" > > > directory > > > > > > > > automatically. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Yang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ferenc Csaky <fe...@pm.me.invalid> 于2022年1月25日周二 22:05写道: > > > > > > > > > > > > > > > > > Hi Konstantin, > > > > > > > > > > > > > > > > > > First of all, sorry for the delay. We at Cloudera are > > currently > > > > > > > relying on > > > > > > > > > per-job mode deploying Flink applications over YARN. > > > > > > > > > > > > > > > > > > Specifically, we allow users to upload connector jars and > > other > > > > > > > artifacts. > > > > > > > > > There are also some default jars that we need to ship. > These > > > are > > > > > all > > > > > > > stored > > > > > > > > > on the local file system of our service’s node. The Flink > job > > > is > > > > > > > submitted > > > > > > > > > on the users’ behalf by our service, which also specifies > the > > > > jars > > > > > to > > > > > > > ship. > > > > > > > > > The service runs on a single node, not on all nodes with > > Flink > > > > > TM/JM. > > > > > > > It > > > > > > > > > would thus be difficult to manage the jars on every node. > > > > > > > > > > > > > > > > > > We are not familiar with the reasoning behind why > application > > > > mode > > > > > > > > > currently doesn’t ship the user jars, besides the > deployment > > > > being > > > > > > > faster > > > > > > > > > this way. Would it be possible for the application mode to > > > > > > (optionally, > > > > > > > > > enabled by some config) distribute these, or are there some > > > > > technical > > > > > > > > > limitations? > > > > > > > > > > > > > > > > > > For us it would be crucial to achieve the functionality we > > have > > > > at > > > > > > the > > > > > > > > > moment over YARN. We started to track > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-24897 that > Biao > > > Geng > > > > > > > > > mentioned as well. > > > > > > > > > > > > > > > > > > Considering the above, for us the more soonish removal does > > not > > > > > sound > > > > > > > > > really well. We can live with this feature as deprecated of > > > > course, > > > > > > > but it > > > > > > > > > would be nice to have some time to figure out how we can > > > utilize > > > > > > > > > Application Mode exactly and make necessary changes if > > > required. > > > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > F > > > > > > > > > > > > > > > > > > On 2022/01/13 08:30:48 Konstantin Knauf wrote: > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > I would like to discuss and understand if the benefits of > > > > having > > > > > > > Per-Job > > > > > > > > > > Mode in Apache Flink outweigh its drawbacks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *# Background: Flink's Deployment Modes* > > > > > > > > > > Flink currently has three deployment modes. They differ > in > > > the > > > > > > > following > > > > > > > > > > dimensions: > > > > > > > > > > * main() method executed on Jobmanager or Client > > > > > > > > > > * dependencies shipped by client or bundled with all > nodes > > > > > > > > > > * number of jobs per cluster & relationship between job > and > > > > > cluster > > > > > > > > > > lifecycle* (supported resource providers) > > > > > > > > > > > > > > > > > > > > ## Application Mode > > > > > > > > > > * main() method executed on Jobmanager > > > > > > > > > > * dependencies already need to be available on all nodes > > > > > > > > > > * dedicated cluster for all jobs executed from the same > > > > > > main()-method > > > > > > > > > > (Note: applications with more than one job, currently > still > > > > > > > significant > > > > > > > > > > limitations like missing high-availability). > Technically, a > > > > > session > > > > > > > > > cluster > > > > > > > > > > dedicated to all jobs submitted from the same main() > > method. > > > > > > > > > > * supported by standalone, native kubernetes, YARN > > > > > > > > > > > > > > > > > > > > ## Session Mode > > > > > > > > > > * main() method executed in client > > > > > > > > > > * dependencies are distributed from and by the client to > > all > > > > > nodes > > > > > > > > > > * cluster is shared by multiple jobs submitted from > > different > > > > > > > clients, > > > > > > > > > > independent lifecycle > > > > > > > > > > * supported by standalone, Native Kubernetes, YARN > > > > > > > > > > > > > > > > > > > > ## Per-Job Mode > > > > > > > > > > * main() method executed in client > > > > > > > > > > * dependencies are distributed from and by the client to > > all > > > > > nodes > > > > > > > > > > * dedicated cluster for a single job > > > > > > > > > > * supported by YARN only > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *# Reasons to Keep** There are use cases where you might > > need > > > > the > > > > > > > > > > combination of a single job per cluster, but main() > method > > > > > > execution > > > > > > > in > > > > > > > > > the > > > > > > > > > > client. This combination is only supported by per-job > mode. > > > > > > > > > > * It currently exists. Existing users will need to > migrate > > to > > > > > > either > > > > > > > > > > session or application mode. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *# Reasons to Drop** With Per-Job Mode and Application > Mode > > > we > > > > > have > > > > > > > two > > > > > > > > > > modes that for most users probably do the same thing. > > > > > Specifically, > > > > > > > for > > > > > > > > > > those users that don't care where the main() method is > > > executed > > > > > and > > > > > > > want > > > > > > > > > to > > > > > > > > > > submit a single job per cluster. Having two ways to do > the > > > same > > > > > > > thing is > > > > > > > > > > confusing. > > > > > > > > > > * Per-Job Mode is only supported by YARN anyway. If we > keep > > > it, > > > > > we > > > > > > > should > > > > > > > > > > work towards support in Kubernetes and Standalone, too, > to > > > > reduce > > > > > > > special > > > > > > > > > > casing. > > > > > > > > > > * Dropping per-job mode would reduce complexity in the > code > > > and > > > > > > > allow us > > > > > > > > > to > > > > > > > > > > dedicate more resources to the other two deployment > modes. > > > > > > > > > > * I believe with session mode and application mode we > have > > to > > > > > > easily > > > > > > > > > > distinguishable and understandable deployment modes that > > > cover > > > > > > > Flink's > > > > > > > > > use > > > > > > > > > > cases: > > > > > > > > > > * session mode: olap-style, interactive jobs/queries, > short > > > > lived > > > > > > > batch > > > > > > > > > > jobs, very small jobs, traditional cluster-centric > > deployment > > > > > mode > > > > > > > (fits > > > > > > > > > > the "Hadoop world") > > > > > > > > > > * application mode: long-running streaming jobs, large > > scale > > > & > > > > > > > > > > heterogenous jobs (resource isolation!), > > application-centric > > > > > > > deployment > > > > > > > > > > mode (fits the "Kubernetes world") > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *# Call to Action* > > > > > > > > > > * Do you use per-job mode? If so, why & would you be able > > to > > > > > > migrate > > > > > > > to > > > > > > > > > one > > > > > > > > > > of the other methods? > > > > > > > > > > * Am I missing any pros/cons? > > > > > > > > > > * Are you in favor of dropping per-job mode midterm? > > > > > > > > > > > > > > > > > > > > Cheers and thank you, > > > > > > > > > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > Konstantin Knauf > > > > > > > > > > > > > > > > > > > > https://twitter.com/snntrable > > > > > > > > > > > > > > > > > > > > https://github.com/knaufk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Konstantin Knauf > > > > > > > > > > > > https://twitter.com/snntrable > > > > > > > > > > > > https://github.com/knaufk > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Konstantin Knauf > > > > https://twitter.com/snntrable > > > > https://github.com/knaufk > > >