Hi Yang, The difference between per-job and application mode is that, as you described, in the per-job mode the main is executed on the client while in the application mode, the main is executed on the cluster. I do not think we have to offer "application mode" with running the main on the client side as this is exactly what the per-job mode does currently and, as you described also, it would be redundant.
Sorry if this was not clear in the document. Cheers, Kostas On Mon, Mar 2, 2020 at 3:17 PM Yang Wang <danrtsey...@gmail.com> wrote: > > Hi Kostas, > > Thanks a lot for your conclusion and updating the FLIP-85 WIKI. Currently, i > have no more > questions about motivation, approach, fault tolerance and the first phase > implementation. > > I think the new title "Flink Application Mode" makes a lot senses to me. > Especially for the > containerized environment, the cluster deploy option will be very useful. > > Just one concern, how do we introduce this new application mode to our users? > Each user program(i.e. `main()`) is an application. Currently, we intend to > only support one > `execute()`. So what's the difference between per-job and application mode? > > For per-job, user `main()` is always executed on client side. And For > application mode, user > `main()` could be executed on client or master side(configured via cli > option). > Right? We need to have a clear concept. Otherwise, the users will be more and > more confusing. > > > Best, > Yang > > Kostas Kloudas <kklou...@gmail.com> 于2020年3月2日周一 下午5:58写道: >> >> Hi all, >> >> I update >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode >> based on the discussion we had here: >> >> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit# >> >> Please let me know what you think and please keep the discussion in the ML :) >> >> Thanks for starting the discussion and I hope that soon we will be >> able to vote on the FLIP. >> >> Cheers, >> Kostas >> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang <danrtsey...@gmail.com> wrote: >> > >> > Hi all, >> > >> > Thanks a lot for the feedback from @Kostas Kloudas. Your all concerns are >> > on point. The FLIP-85 is mainly >> > focused on supporting cluster mode for per-job. Since it is more urgent and >> > have much more use >> > cases both in Yarn and Kubernetes deployment. For session cluster, we could >> > have more discussion >> > in a new thread later. >> > >> > #1, How to download the user jars and dependencies for per-job in cluster >> > mode? >> > For Yarn, we could register the user jars and dependencies as >> > LocalResource. They will be distributed >> > by Yarn. And once the JobManager and TaskManager launched, the jars are >> > already exists. >> > For Standalone per-job and K8s, we expect that the user jars >> > and dependencies are built into the image. >> > Or the InitContainer could be used for downloading. It is natively >> > distributed and we will not have bottleneck. >> > >> > #2, Job graph recovery >> > We could have an optimization to store job graph on the DFS. However, i >> > suggest building a new jobgraph >> > from the configuration is the default option. Since we will not always have >> > a DFS store when deploying a >> > Flink per-job cluster. Of course, we assume that using the same >> > configuration(e.g. job_id, user_jar, main_class, >> > main_args, parallelism, savepoint_settings, etc.) will get a same job >> > graph. I think the standalone per-job >> > already has the similar behavior. >> > >> > #3, What happens with jobs that have multiple execute calls? >> > Currently, it is really a problem. Even we use a local client on Flink >> > master side, it will have different behavior with >> > client mode. For client mode, if we execute multiple times, then we will >> > deploy multiple Flink clusters for each execute. >> > I am not pretty sure whether it is reasonable. However, i still think using >> > the local client is a good choice. We could >> > continue the discussion in a new thread. @Zili Chen <wander4...@gmail.com> >> > Do >> > you want to drive this? >> > >> > >> > >> > Best, >> > Yang >> > >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月16日周四 上午1:55写道: >> > >> > > Hi Kostas, >> > > >> > > Thanks for this feedback. I can't agree more about the opinion. The >> > > cluster mode should be added >> > > first in per job cluster. >> > > >> > > 1) For job cluster implementation >> > > 1. Job graph recovery from configuration or store as static job graph as >> > > session cluster. I think the static one will be better for less recovery >> > > time. >> > > Let me update the doc for details. >> > > >> > > 2. For job execute multiple times, I think @Zili Chen >> > > <wander4...@gmail.com> has proposed the local client solution that can >> > > the run program actually in the cluster entry point. We can put the >> > > implementation in the second stage, >> > > or even a new FLIP for further discussion. >> > > >> > > 2) For session cluster implementation >> > > We can disable the cluster mode for the session cluster in the first >> > > stage. I agree the jar downloading will be a painful thing. >> > > We can consider about PoC and performance evaluation first. If the end to >> > > end experience is good enough, then we can consider >> > > proceeding with the solution. >> > > >> > > Looking forward to more opinions from @Yang Wang <danrtsey...@gmail.com> >> > > @Zili >> > > Chen <wander4...@gmail.com> @Dian Fu <dian0511...@gmail.com>. >> > > >> > > >> > > Best Regards >> > > Peter Huang >> > > >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas <kklou...@gmail.com> >> > > wrote: >> > > >> > >> Hi all, >> > >> >> > >> I am writing here as the discussion on the Google Doc seems to be a >> > >> bit difficult to follow. >> > >> >> > >> I think that in order to be able to make progress, it would be helpful >> > >> to focus on per-job mode for now. >> > >> The reason is that: >> > >> 1) making the (unique) JobSubmitHandler responsible for creating the >> > >> jobgraphs, >> > >> which includes downloading dependencies, is not an optimal solution >> > >> 2) even if we put the responsibility on the JobMaster, currently each >> > >> job has its own >> > >> JobMaster but they all run on the same process, so we have again a >> > >> single entity. >> > >> >> > >> Of course after this is done, and if we feel comfortable with the >> > >> solution, then we can go to the session mode. >> > >> >> > >> A second comment has to do with fault-tolerance in the per-job, >> > >> cluster-deploy mode. >> > >> In the document, it is suggested that upon recovery, the JobMaster of >> > >> each job re-creates the JobGraph. >> > >> I am just wondering if it is better to create and store the jobGraph >> > >> upon submission and only fetch it >> > >> upon recovery so that we have a static jobGraph. >> > >> >> > >> Finally, I have a question which is what happens with jobs that have >> > >> multiple execute calls? >> > >> The semantics seem to change compared to the current behaviour, right? >> > >> >> > >> Cheers, >> > >> Kostas >> > >> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison <wander4...@gmail.com> wrote: >> > >> > >> > >> > not always, Yang Wang is also not yet a committer but he can join the >> > >> > channel. I cannot find the id by clicking “Add new member in channel” >> > >> > so >> > >> > come to you and ask for try out the link. Possibly I will find other >> > >> ways >> > >> > but the original purpose is that the slack channel is a public area we >> > >> > discuss about developing... >> > >> > Best, >> > >> > tison. >> > >> > >> > >> > >> > >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月9日周四 上午2:44写道: >> > >> > >> > >> > > Hi Tison, >> > >> > > >> > >> > > I am not the committer of Flink yet. I think I can't join it also. >> > >> > > >> > >> > > >> > >> > > Best Regards >> > >> > > Peter Huang >> > >> > > >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison <wander4...@gmail.com> wrote: >> > >> > > >> > >> > > > Hi Peter, >> > >> > > > >> > >> > > > Could you try out this link? >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH >> > >> > > > >> > >> > > > Best, >> > >> > > > tison. >> > >> > > > >> > >> > > > >> > >> > > > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月9日周四 上午1:22写道: >> > >> > > > >> > >> > > > > Hi Tison, >> > >> > > > > >> > >> > > > > I can't join the group with shared link. Would you please add me >> > >> into >> > >> > > the >> > >> > > > > group? My slack account is huangzhenqiu0825. >> > >> > > > > Thank you in advance. >> > >> > > > > >> > >> > > > > >> > >> > > > > Best Regards >> > >> > > > > Peter Huang >> > >> > > > > >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison <wander4...@gmail.com> >> > >> wrote: >> > >> > > > > >> > >> > > > > > Hi Peter, >> > >> > > > > > >> > >> > > > > > As described above, this effort should get attention from >> > >> > > > > > people >> > >> > > > > developing >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I recommend you to join >> > >> the >> > >> > > > public >> > >> > > > > > slack channel[1] for Flink Client API Enhancement and you can >> > >> try to >> > >> > > > > share >> > >> > > > > > you detailed thoughts there. It possibly gets more concrete >> > >> > > attentions. >> > >> > > > > > >> > >> > > > > > Best, >> > >> > > > > > tison. >> > >> > > > > > >> > >> > > > > > [1] >> > >> > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月7日周二 >> > >> > > > > > 上午5:09写道: >> > >> > > > > > >> > >> > > > > > > Dear All, >> > >> > > > > > > >> > >> > > > > > > Happy new year! According to existing feedback from the >> > >> community, >> > >> > > we >> > >> > > > > > > revised the doc with the consideration of session cluster >> > >> support, >> > >> > > > and >> > >> > > > > > > concrete interface changes needed and execution plan. Please >> > >> take >> > >> > > one >> > >> > > > > > more >> > >> > > > > > > round of review at your most convenient time. >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit# >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > Best Regards >> > >> > > > > > > Peter Huang >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter Huang < >> > >> > > > > huangzhenqiu0...@gmail.com> >> > >> > > > > > > wrote: >> > >> > > > > > > >> > >> > > > > > > > Hi Dian, >> > >> > > > > > > > Thanks for giving us valuable feedbacks. >> > >> > > > > > > > >> > >> > > > > > > > 1) It's better to have a whole design for this feature >> > >> > > > > > > > For the suggestion of enabling the cluster mode also >> > >> > > > > > > > session >> > >> > > > > cluster, I >> > >> > > > > > > > think Flink already supported it. WebSubmissionExtension >> > >> already >> > >> > > > > allows >> > >> > > > > > > > users to start a job with the specified jar by using web >> > >> > > > > > > > UI. >> > >> > > > > > > > But we need to enable the feature from CLI for both local >> > >> jar, >> > >> > > > remote >> > >> > > > > > > jar. >> > >> > > > > > > > I will align with Yang Wang first about the details and >> > >> update >> > >> > > the >> > >> > > > > > design >> > >> > > > > > > > doc. >> > >> > > > > > > > >> > >> > > > > > > > 2) It's better to consider the convenience for users, such >> > >> as >> > >> > > > > debugging >> > >> > > > > > > > >> > >> > > > > > > > I am wondering whether we can store the exception in >> > >> jobgragh >> > >> > > > > > > > generation in application master. As no streaming graph >> > >> > > > > > > > can >> > >> be >> > >> > > > > > scheduled >> > >> > > > > > > in >> > >> > > > > > > > this case, there will be no more TM will be requested from >> > >> > > FlinkRM. >> > >> > > > > > > > If the AM is still running, users can still query it from >> > >> CLI. As >> > >> > > > it >> > >> > > > > > > > requires more change, we can get some feedback from < >> > >> > > > > > aljos...@apache.org >> > >> > > > > > > > >> > >> > > > > > > > and @zjf...@gmail.com <zjf...@gmail.com>. >> > >> > > > > > > > >> > >> > > > > > > > 3) It's better to consider the impact to the stability of >> > >> the >> > >> > > > cluster >> > >> > > > > > > > >> > >> > > > > > > > I agree with Yang Wang's opinion. >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > Best Regards >> > >> > > > > > > > Peter Huang >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian Fu < >> > >> dian0511...@gmail.com> >> > >> > > > > wrote: >> > >> > > > > > > > >> > >> > > > > > > >> Hi all, >> > >> > > > > > > >> >> > >> > > > > > > >> Sorry to jump into this discussion. Thanks everyone for >> > >> > > > > > > >> the >> > >> > > > > > discussion. >> > >> > > > > > > >> I'm very interested in this topic although I'm not an >> > >> expert in >> > >> > > > this >> > >> > > > > > > part. >> > >> > > > > > > >> So I'm glad to share my thoughts as following: >> > >> > > > > > > >> >> > >> > > > > > > >> 1) It's better to have a whole design for this feature >> > >> > > > > > > >> As we know, there are two deployment modes: per-job mode >> > >> and >> > >> > > > session >> > >> > > > > > > >> mode. I'm wondering which mode really needs this feature. >> > >> As the >> > >> > > > > > design >> > >> > > > > > > doc >> > >> > > > > > > >> mentioned, per-job mode is more used for streaming jobs >> > >> > > > > > > >> and >> > >> > > > session >> > >> > > > > > > mode is >> > >> > > > > > > >> usually used for batch jobs(Of course, the job types and >> > >> the >> > >> > > > > > deployment >> > >> > > > > > > >> modes are orthogonal). Usually streaming job is only >> > >> needed to >> > >> > > be >> > >> > > > > > > submitted >> > >> > > > > > > >> once and it will run for days or weeks, while batch jobs >> > >> will be >> > >> > > > > > > submitted >> > >> > > > > > > >> more frequently compared with streaming jobs. This means >> > >> that >> > >> > > > maybe >> > >> > > > > > > session >> > >> > > > > > > >> mode also needs this feature. However, if we support this >> > >> > > feature >> > >> > > > in >> > >> > > > > > > >> session mode, the application master will become the new >> > >> > > > centralized >> > >> > > > > > > >> service(which should be solved). So in this case, it's >> > >> better to >> > >> > > > > have >> > >> > > > > > a >> > >> > > > > > > >> complete design for both per-job mode and session mode. >> > >> > > > Furthermore, >> > >> > > > > > > even >> > >> > > > > > > >> if we can do it phase by phase, we need to have a whole >> > >> picture >> > >> > > of >> > >> > > > > how >> > >> > > > > > > it >> > >> > > > > > > >> works in both per-job mode and session mode. >> > >> > > > > > > >> >> > >> > > > > > > >> 2) It's better to consider the convenience for users, >> > >> > > > > > > >> such >> > >> as >> > >> > > > > > debugging >> > >> > > > > > > >> After we finish this feature, the job graph will be >> > >> compiled in >> > >> > > > the >> > >> > > > > > > >> application master, which means that users cannot easily >> > >> get the >> > >> > > > > > > exception >> > >> > > > > > > >> message synchorousely in the job client if there are >> > >> problems >> > >> > > > during >> > >> > > > > > the >> > >> > > > > > > >> job graph compiling (especially for platform users), such >> > >> as the >> > >> > > > > > > resource >> > >> > > > > > > >> path is incorrect, the user program itself has some >> > >> problems, >> > >> > > etc. >> > >> > > > > > What >> > >> > > > > > > I'm >> > >> > > > > > > >> thinking is that maybe we should throw the exceptions as >> > >> early >> > >> > > as >> > >> > > > > > > possible >> > >> > > > > > > >> (during job submission stage). >> > >> > > > > > > >> >> > >> > > > > > > >> 3) It's better to consider the impact to the stability of >> > >> the >> > >> > > > > cluster >> > >> > > > > > > >> If we perform the compiling in the application master, we >> > >> should >> > >> > > > > > > consider >> > >> > > > > > > >> the impact of the compiling errors. Although YARN could >> > >> resume >> > >> > > the >> > >> > > > > > > >> application master in case of failures, but in some case >> > >> the >> > >> > > > > compiling >> > >> > > > > > > >> failure may be a waste of cluster resource and may impact >> > >> the >> > >> > > > > > stability >> > >> > > > > > > the >> > >> > > > > > > >> cluster and the other jobs in the cluster, such as the >> > >> resource >> > >> > > > path >> > >> > > > > > is >> > >> > > > > > > >> incorrect, the user program itself has some problems(in >> > >> this >> > >> > > case, >> > >> > > > > job >> > >> > > > > > > >> failover cannot solve this kind of problems) etc. In the >> > >> current >> > >> > > > > > > >> implemention, the compiling errors are handled in the >> > >> client >> > >> > > side >> > >> > > > > and >> > >> > > > > > > there >> > >> > > > > > > >> is no impact to the cluster at all. >> > >> > > > > > > >> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed in the design doc >> > >> that >> > >> > > only >> > >> > > > > > > per-job >> > >> > > > > > > >> mode will be supported. However, I think it's better to >> > >> also >> > >> > > > > consider >> > >> > > > > > > the >> > >> > > > > > > >> session mode in the design doc. >> > >> > > > > > > >> Regarding to 2) and 3), I have not seen related sections >> > >> in the >> > >> > > > > design >> > >> > > > > > > >> doc. It will be good if we can cover them in the design >> > >> doc. >> > >> > > > > > > >> >> > >> > > > > > > >> Feel free to correct me If there is anything I >> > >> misunderstand. >> > >> > > > > > > >> >> > >> > > > > > > >> Regards, >> > >> > > > > > > >> Dian >> > >> > > > > > > >> >> > >> > > > > > > >> >> > >> > > > > > > >> > 在 2019年12月27日,上午3:13,Peter Huang < >> > >> huangzhenqiu0...@gmail.com> >> > >> > > > 写道: >> > >> > > > > > > >> > >> > >> > > > > > > >> > Hi Yang, >> > >> > > > > > > >> > >> > >> > > > > > > >> > I can't agree more. The effort definitely needs to >> > >> > > > > > > >> > align >> > >> with >> > >> > > > the >> > >> > > > > > > final >> > >> > > > > > > >> > goal of FLIP-73. >> > >> > > > > > > >> > I am thinking about whether we can achieve the goal >> > >> > > > > > > >> > with >> > >> two >> > >> > > > > phases. >> > >> > > > > > > >> > >> > >> > > > > > > >> > 1) Phase I >> > >> > > > > > > >> > As the CLiFrontend will not be depreciated soon. We can >> > >> still >> > >> > > > use >> > >> > > > > > the >> > >> > > > > > > >> > deployMode flag there, >> > >> > > > > > > >> > pass the program info through Flink configuration, use >> > >> the >> > >> > > > > > > >> > ClassPathJobGraphRetriever >> > >> > > > > > > >> > to generate the job graph in ClusterEntrypoints of yarn >> > >> and >> > >> > > > > > > Kubernetes. >> > >> > > > > > > >> > >> > >> > > > > > > >> > 2) Phase II >> > >> > > > > > > >> > In AbstractJobClusterExecutor, the job graph is >> > >> generated in >> > >> > > > the >> > >> > > > > > > >> execute >> > >> > > > > > > >> > function. We can still >> > >> > > > > > > >> > use the deployMode in it. With deployMode = cluster, >> > >> > > > > > > >> > the >> > >> > > execute >> > >> > > > > > > >> function >> > >> > > > > > > >> > only starts the cluster. >> > >> > > > > > > >> > >> > >> > > > > > > >> > When {Yarn/Kuberneates}PerJobClusterEntrypoint starts, >> > >> It will >> > >> > > > > start >> > >> > > > > > > the >> > >> > > > > > > >> > dispatch first, then we can use >> > >> > > > > > > >> > a ClusterEnvironment similar to ContextEnvironment to >> > >> submit >> > >> > > the >> > >> > > > > job >> > >> > > > > > > >> with >> > >> > > > > > > >> > jobName the local >> > >> > > > > > > >> > dispatcher. For the details, we need more >> > >> > > > > > > >> > investigation. >> > >> Let's >> > >> > > > > wait >> > >> > > > > > > >> > for @Aljoscha >> > >> > > > > > > >> > Krettek <aljos...@apache.org> @Till Rohrmann < >> > >> > > > > trohrm...@apache.org >> > >> > > > > > >'s >> > >> > > > > > > >> > feedback after the holiday season. >> > >> > > > > > > >> > >> > >> > > > > > > >> > Thank you in advance. Merry Chrismas and Happy New >> > >> Year!!! >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > Best Regards >> > >> > > > > > > >> > Peter Huang >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM Yang Wang < >> > >> > > > danrtsey...@gmail.com> >> > >> > > > > > > >> wrote: >> > >> > > > > > > >> > >> > >> > > > > > > >> >> Hi Peter, >> > >> > > > > > > >> >> >> > >> > > > > > > >> >> I think we need to reconsider tison's suggestion >> > >> seriously. >> > >> > > > After >> > >> > > > > > > >> FLIP-73, >> > >> > > > > > > >> >> the deployJobCluster has >> > >> > > > > > > >> >> beenmoved into `JobClusterExecutor#execute`. It should >> > >> not be >> > >> > > > > > > perceived >> > >> > > > > > > >> >> for `CliFrontend`. That >> > >> > > > > > > >> >> means the user program will *ALWAYS* be executed on >> > >> client >> > >> > > > side. >> > >> > > > > > This >> > >> > > > > > > >> is >> > >> > > > > > > >> >> the by design behavior. >> > >> > > > > > > >> >> So, we could not just add `if(client mode) .. else >> > >> if(cluster >> > >> > > > > mode) >> > >> > > > > > > >> ...` >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass >> > >> > > > > > > >> >> the executor. We need to find a clean way to decouple >> > >> > > executing >> > >> > > > > > user >> > >> > > > > > > >> >> program and deploying per-job >> > >> > > > > > > >> >> cluster. Based on this, we could support to execute >> > >> > > > > > > >> >> user >> > >> > > > program >> > >> > > > > on >> > >> > > > > > > >> client >> > >> > > > > > > >> >> or master side. >> > >> > > > > > > >> >> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give some good >> > >> suggestions. >> > >> > > > > > > >> >> >> > >> > > > > > > >> >> >> > >> > > > > > > >> >> >> > >> > > > > > > >> >> Best, >> > >> > > > > > > >> >> Yang >> > >> > > > > > > >> >> >> > >> > > > > > > >> >> Peter Huang <huangzhenqiu0...@gmail.com> >> > >> > > > > > > >> >> 于2019年12月25日周三 >> > >> > > > > 上午4:03写道: >> > >> > > > > > > >> >> >> > >> > > > > > > >> >>> Hi Jingjing, >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >>> The improvement proposed is a deployment option for >> > >> CLI. For >> > >> > > > SQL >> > >> > > > > > > based >> > >> > > > > > > >> >>> Flink application, It is more convenient to use the >> > >> existing >> > >> > > > > model >> > >> > > > > > > in >> > >> > > > > > > >> >>> SqlClient in which >> > >> > > > > > > >> >>> the job graph is generated within SqlClient. After >> > >> adding >> > >> > > the >> > >> > > > > > > delayed >> > >> > > > > > > >> job >> > >> > > > > > > >> >>> graph generation, I think there is no change is >> > >> > > > > > > >> >>> needed >> > >> for >> > >> > > > your >> > >> > > > > > > side. >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >>> Best Regards >> > >> > > > > > > >> >>> Peter Huang >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM jingjing bai < >> > >> > > > > > > >> baijingjing7...@gmail.com> >> > >> > > > > > > >> >>> wrote: >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >>>> hi peter: >> > >> > > > > > > >> >>>> we had extension SqlClent to support sql job >> > >> submit in >> > >> > > web >> > >> > > > > > base >> > >> > > > > > > on >> > >> > > > > > > >> >>>> flink 1.9. we support submit to yarn on per job >> > >> mode too. >> > >> > > > > > > >> >>>> in this case, the job graph generated on client >> > >> side >> > >> > > . I >> > >> > > > > > think >> > >> > > > > > > >> >>> this >> > >> > > > > > > >> >>>> discuss Mainly to improve api programme. but in my >> > >> case , >> > >> > > > > there >> > >> > > > > > is >> > >> > > > > > > >> no >> > >> > > > > > > >> >>>> jar to upload but only a sql string . >> > >> > > > > > > >> >>>> do u had more suggestion to improve for sql mode >> > >> or it >> > >> > > is >> > >> > > > > > only a >> > >> > > > > > > >> >>>> switch for api programme? >> > >> > > > > > > >> >>>> >> > >> > > > > > > >> >>>> >> > >> > > > > > > >> >>>> best >> > >> > > > > > > >> >>>> bai jj >> > >> > > > > > > >> >>>> >> > >> > > > > > > >> >>>> >> > >> > > > > > > >> >>>> Yang Wang <danrtsey...@gmail.com> 于2019年12月18日周三 >> > >> 下午7:21写道: >> > >> > > > > > > >> >>>> >> > >> > > > > > > >> >>>>> I just want to revive this discussion. >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>>> Recently, i am thinking about how to natively run >> > >> flink >> > >> > > > > per-job >> > >> > > > > > > >> >>> cluster on >> > >> > > > > > > >> >>>>> Kubernetes. >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes is very different >> > >> from on >> > >> > > > Yarn. >> > >> > > > > > And >> > >> > > > > > > >> we >> > >> > > > > > > >> >>> will >> > >> > > > > > > >> >>>>> have >> > >> > > > > > > >> >>>>> the same deployment requirements to the client and >> > >> entry >> > >> > > > > point. >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>>> 1. Flink client not always need a local jar to >> > >> > > > > > > >> >>>>> start >> > >> a >> > >> > > Flink >> > >> > > > > > > per-job >> > >> > > > > > > >> >>>>> cluster. We could >> > >> > > > > > > >> >>>>> support multiple schemas. For example, >> > >> > > > file:///path/of/my.jar >> > >> > > > > > > means >> > >> > > > > > > >> a >> > >> > > > > > > >> >>> jar >> > >> > > > > > > >> >>>>> located >> > >> > > > > > > >> >>>>> at client side, >> > >> hdfs://myhdfs/user/myname/flink/my.jar >> > >> > > > means a >> > >> > > > > > jar >> > >> > > > > > > >> >>> located >> > >> > > > > > > >> >>>>> at >> > >> > > > > > > >> >>>>> remote hdfs, local:///path/in/image/my.jar means a >> > >> jar >> > >> > > > located >> > >> > > > > > at >> > >> > > > > > > >> >>>>> jobmanager side. >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>>> 2. Support running user program on master side. >> > >> > > > > > > >> >>>>> This >> > >> also >> > >> > > > > means >> > >> > > > > > > the >> > >> > > > > > > >> >>> entry >> > >> > > > > > > >> >>>>> point >> > >> > > > > > > >> >>>>> will generate the job graph on master side. We >> > >> > > > > > > >> >>>>> could >> > >> use >> > >> > > the >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever >> > >> > > > > > > >> >>>>> or start a local Flink client to achieve this >> > >> purpose. >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do you think this is >> > >> > > > > > > >> >>>>> the >> > >> right >> > >> > > > > > > >> direction we >> > >> > > > > > > >> >>>>> need to work? >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>>> tison <wander4...@gmail.com> 于2019年12月12日周四 >> > >> 下午4:48写道: >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>>>> A quick idea is that we separate the deployment >> > >> from user >> > >> > > > > > program >> > >> > > > > > > >> >>> that >> > >> > > > > > > >> >>>>> it >> > >> > > > > > > >> >>>>>> has always been done >> > >> > > > > > > >> >>>>>> outside the program. On user program executed >> > >> > > > > > > >> >>>>>> there >> > >> is >> > >> > > > > always a >> > >> > > > > > > >> >>>>>> ClusterClient that communicates with >> > >> > > > > > > >> >>>>>> an existing cluster, remote or local. It will be >> > >> another >> > >> > > > > thread >> > >> > > > > > > so >> > >> > > > > > > >> >>> just >> > >> > > > > > > >> >>>>> for >> > >> > > > > > > >> >>>>>> your information. >> > >> > > > > > > >> >>>>>> >> > >> > > > > > > >> >>>>>> Best, >> > >> > > > > > > >> >>>>>> tison. >> > >> > > > > > > >> >>>>>> >> > >> > > > > > > >> >>>>>> >> > >> > > > > > > >> >>>>>> tison <wander4...@gmail.com> 于2019年12月12日周四 >> > >> 下午4:40写道: >> > >> > > > > > > >> >>>>>> >> > >> > > > > > > >> >>>>>>> Hi Peter, >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>>> Another concern I realized recently is that with >> > >> current >> > >> > > > > > > Executors >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73) >> > >> > > > > > > >> >>>>>>> I'm afraid that user program is designed to >> > >> > > > > > > >> >>>>>>> ALWAYS >> > >> run >> > >> > > on >> > >> > > > > the >> > >> > > > > > > >> >>> client >> > >> > > > > > > >> >>>>>> side. >> > >> > > > > > > >> >>>>>>> Specifically, >> > >> > > > > > > >> >>>>>>> we deploy the job in executor when env.execute >> > >> called. >> > >> > > > This >> > >> > > > > > > >> >>>>> abstraction >> > >> > > > > > > >> >>>>>>> possibly prevents >> > >> > > > > > > >> >>>>>>> Flink runs user program on the cluster side. >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>>> For your proposal, in this case we already >> > >> compiled the >> > >> > > > > > program >> > >> > > > > > > >> and >> > >> > > > > > > >> >>>>> run >> > >> > > > > > > >> >>>>>> on >> > >> > > > > > > >> >>>>>>> the client side, >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and retrieve job graph >> > >> from >> > >> > > > program >> > >> > > > > > > >> >>>>> metadata, it >> > >> > > > > > > >> >>>>>>> doesn't make >> > >> > > > > > > >> >>>>>>> many sense. >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do you think about this >> > >> > > > > constraint? >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>>> Best, >> > >> > > > > > > >> >>>>>>> tison. >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>>> Peter Huang <huangzhenqiu0...@gmail.com> >> > >> 于2019年12月10日周二 >> > >> > > > > > > >> 下午12:45写道: >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>>>> Hi Tison, >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think I made the wrong >> > >> argument >> > >> > > in >> > >> > > > > the >> > >> > > > > > > doc. >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar problem is only for >> > >> > > platform >> > >> > > > > > > users. >> > >> > > > > > > >> >>> In >> > >> > > > > > > >> >>>>> our >> > >> > > > > > > >> >>>>>>>> internal deploy service, >> > >> > > > > > > >> >>>>>>>> we further optimized the deployment latency by >> > >> letting >> > >> > > > > users >> > >> > > > > > to >> > >> > > > > > > >> >>>>>> packaging >> > >> > > > > > > >> >>>>>>>> flink-runtime together with the uber jar, so >> > >> > > > > > > >> >>>>>>>> that >> > >> we >> > >> > > > don't >> > >> > > > > > need >> > >> > > > > > > >> to >> > >> > > > > > > >> >>>>>>>> consider >> > >> > > > > > > >> >>>>>>>> multiple flink version >> > >> > > > > > > >> >>>>>>>> support for now. In the session client mode, as >> > >> Flink >> > >> > > > libs >> > >> > > > > > will >> > >> > > > > > > >> be >> > >> > > > > > > >> >>>>>> shipped >> > >> > > > > > > >> >>>>>>>> anyway as local resources of yarn. Users >> > >> > > > > > > >> >>>>>>>> actually >> > >> don't >> > >> > > > > need >> > >> > > > > > to >> > >> > > > > > > >> >>>>> package >> > >> > > > > > > >> >>>>>>>> those libs into job jar. >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>>>> Best Regards >> > >> > > > > > > >> >>>>>>>> Peter Huang >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35 PM tison < >> > >> > > > wander4...@gmail.com >> > >> > > > > > >> > >> > > > > > > >> >>> wrote: >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about the package? Do >> > >> > > > > > > >> >>>>>>>>>> users >> > >> need >> > >> > > to >> > >> > > > > > > >> >>> compile >> > >> > > > > > > >> >>>>>> their >> > >> > > > > > > >> >>>>>>>>> jars >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients, flink-optimizer, >> > >> flink-table >> > >> > > > > codes? >> > >> > > > > > > >> >>>>>>>>> >> > >> > > > > > > >> >>>>>>>>> The answer should be no because they exist in >> > >> system >> > >> > > > > > > classpath. >> > >> > > > > > > >> >>>>>>>>> >> > >> > > > > > > >> >>>>>>>>> Best, >> > >> > > > > > > >> >>>>>>>>> tison. >> > >> > > > > > > >> >>>>>>>>> >> > >> > > > > > > >> >>>>>>>>> >> > >> > > > > > > >> >>>>>>>>> Yang Wang <danrtsey...@gmail.com> >> > >> > > > > > > >> >>>>>>>>> 于2019年12月10日周二 >> > >> > > > > 下午12:18写道: >> > >> > > > > > > >> >>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter, >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting this discussion. I >> > >> think >> > >> > > this >> > >> > > > > is >> > >> > > > > > a >> > >> > > > > > > >> >>> very >> > >> > > > > > > >> >>>>>>>> useful >> > >> > > > > > > >> >>>>>>>>>> feature. >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am focused on flink on >> > >> > > Kubernetes >> > >> > > > > > > >> >>>>> integration >> > >> > > > > > > >> >>>>>> and >> > >> > > > > > > >> >>>>>>>>> come >> > >> > > > > > > >> >>>>>>>>>> across the same >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the job graph generated >> > >> on >> > >> > > > client >> > >> > > > > > > side. >> > >> > > > > > > >> >>>>>>>> Instead, >> > >> > > > > > > >> >>>>>>>>> the >> > >> > > > > > > >> >>>>>>>>>> user jars are built in >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When the job manager >> > >> launched, >> > >> > > we >> > >> > > > > > just >> > >> > > > > > > >> >>>>> need to >> > >> > > > > > > >> >>>>>>>>>> generate the job graph >> > >> > > > > > > >> >>>>>>>>>> based on local user jars. >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> I have some small suggestion about this. >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> 1. `ProgramJobGraphRetriever` is very similar >> > >> > > > > > > >> >>>>>>>>>> to >> > >> > > > > > > >> >>>>>>>>>> `ClasspathJobGraphRetriever`, the differences >> > >> > > > > > > >> >>>>>>>>>> are the former needs `ProgramMetadata` and the >> > >> latter >> > >> > > > > needs >> > >> > > > > > > >> >>> some >> > >> > > > > > > >> >>>>>>>>> arguments. >> > >> > > > > > > >> >>>>>>>>>> Is it possible to >> > >> > > > > > > >> >>>>>>>>>> have an unified `JobGraphRetriever` to support >> > >> both? >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not use a local user jar >> > >> > > > > > > >> >>>>>>>>>> to >> > >> > > start >> > >> > > > a >> > >> > > > > > > >> >>> per-job >> > >> > > > > > > >> >>>>>>>> cluster? >> > >> > > > > > > >> >>>>>>>>>> In your case, the user jars has >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already and we do need to >> > >> download >> > >> > > the >> > >> > > > > jars >> > >> > > > > > > to >> > >> > > > > > > >> >>>>>>>> deployer >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we >> > >> > > > > > > >> >>>>>>>>>> always need a local user jar to start a flink >> > >> > > cluster. >> > >> > > > It >> > >> > > > > > is >> > >> > > > > > > >> >>> be >> > >> > > > > > > >> >>>>>> great >> > >> > > > > > > >> >>>>>>>> if >> > >> > > > > > > >> >>>>>>>>> we >> > >> > > > > > > >> >>>>>>>>>> could support remote user jars. >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, we assume users >> > >> > > > > > > >> >>>>>>>>>>>> package >> > >> > > > > > > >> >>> flink-clients, >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, flink-table together within >> > >> the job >> > >> > > > jar. >> > >> > > > > > > >> >>>>> Otherwise, >> > >> > > > > > > >> >>>>>>>> the >> > >> > > > > > > >> >>>>>>>>>> job graph generation within >> > >> JobClusterEntryPoint will >> > >> > > > > fail. >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about the package? Do >> > >> > > > > > > >> >>>>>>>>>> users >> > >> need >> > >> > > to >> > >> > > > > > > >> >>> compile >> > >> > > > > > > >> >>>>>> their >> > >> > > > > > > >> >>>>>>>>> jars >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients, flink-optimizer, >> > >> flink-table >> > >> > > > > > codes? >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> Best, >> > >> > > > > > > >> >>>>>>>>>> Yang >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang <huangzhenqiu0...@gmail.com> >> > >> > > > 于2019年12月10日周二 >> > >> > > > > > > >> >>>>> 上午2:37写道: >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>>> Dear All, >> > >> > > > > > > >> >>>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink community starts to >> > >> improve the >> > >> > > > yarn >> > >> > > > > > > >> >>>>> cluster >> > >> > > > > > > >> >>>>>>>>>> descriptor >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and config files configurable >> > >> from >> > >> > > > CLI. >> > >> > > > > It >> > >> > > > > > > >> >>>>>> improves >> > >> > > > > > > >> >>>>>>>> the >> > >> > > > > > > >> >>>>>>>>>>> flexibility of Flink deployment Yarn Per Job >> > >> Mode. >> > >> > > > For >> > >> > > > > > > >> >>>>> platform >> > >> > > > > > > >> >>>>>>>> users >> > >> > > > > > > >> >>>>>>>>>> who >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds of streaming >> > >> > > > > > > >> >>>>>>>>>>> pipelines >> > >> for >> > >> > > the >> > >> > > > > > whole >> > >> > > > > > > >> >>>>> org >> > >> > > > > > > >> >>>>>> or >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job graph generation in >> > >> > > > > client-side >> > >> > > > > > is >> > >> > > > > > > >> >>>>>> another >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want to propose a >> > >> configurable >> > >> > > > > feature >> > >> > > > > > > >> >>> for >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The feature can allow >> > >> users to >> > >> > > > > choose >> > >> > > > > > > >> >>> the >> > >> > > > > > > >> >>>>> job >> > >> > > > > > > >> >>>>>>>>> graph >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink ClusterEntryPoint so that >> > >> the >> > >> > > job >> > >> > > > > jar >> > >> > > > > > > >> >>>>> doesn't >> > >> > > > > > > >> >>>>>>>> need >> > >> > > > > > > >> >>>>>>>>> to >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job graph generation. The >> > >> > > proposal >> > >> > > > is >> > >> > > > > > > >> >>>>> organized >> > >> > > > > > > >> >>>>>>>> as a >> > >> > > > > > > >> >>>>>>>>>>> FLIP >> > >> > > > > > > >> >>>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>> >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>> >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation >> > >> > > > > > > >> >>>>>>>>>>> . >> > >> > > > > > > >> >>>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and suggestions are welcomed. >> > >> Thank >> > >> > > you >> > >> > > > in >> > >> > > > > > > >> >>>>> advance. >> > >> > > > > > > >> >>>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang >> > >> > > > > > > >> >>>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>>> >> > >> > > > > > > >> >>>>>>>>> >> > >> > > > > > > >> >>>>>>>> >> > >> > > > > > > >> >>>>>>> >> > >> > > > > > > >> >>>>>> >> > >> > > > > > > >> >>>>> >> > >> > > > > > > >> >>>> >> > >> > > > > > > >> >>> >> > >> > > > > > > >> >> >> > >> > > > > > > >> >> > >> > > > > > > >> >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> >> > > >>