Thanks for the reply, tison and Yang, Regarding the public interface, is "-R/--remote" option the only change? Will the users also need to provide a remote location to upload and store the jars, and a list of jars as dependencies to be uploaded?
It would be important that the public interface section in the FLIP includes all the user sensible changes including the CLI / configuration / metrics, etc. Can we update the FLIP to include the conclusion we have here in the ML? Thanks, Jiangjie (Becket) Qin On Mon, Mar 9, 2020 at 11:59 AM Yang Wang <danrtsey...@gmail.com> wrote: > Hi Becket, > > Thanks for jumping out and sharing your concerns. I second tison's answer > and just > make some additions. > > > > job submission interface > > This FLIP will introduce an interface for running user `main()` on > cluster, named as > “ProgramDeployer”. However, it is not a public interface. It will be used > in `CliFrontend` > when the remote deploy option(-R/--remote-deploy) is specified. So the > only changes > on user side is about the cli option. > > > > How to fetch the jars? > > The “local path” and “dfs path“ could be supported to fetch the user jars > and dependencies. > Just like tison has said, we could ship the user jar and dependencies from > client side to > HDFS and use the entrypoint to fetch. > > Also we have some other practical ways to use the new “application mode“. > 1. Upload the user jars and dependencies to the DFS(e.g. HDFS, S3, Aliyun > OSS) manually > or some external deployer system. For K8s, the user jars and dependencies > could also be > built in the docker image. > 2. Specify the remote/local user jar and dependencies in `flink run`. > Usually this could also > be done by the external deployer system. > 3. When the `ClusterEntrypoint` is launched, it will fetch the jars and > files automatically. We > do not need any specific fetcher implementation. Since we could leverage > flink `FileSystem` > to do this. > > > > > > Best, > Yang > > tison <wander4...@gmail.com> 于2020年3月9日周一 上午11:34写道: > >> Hi Becket, >> >> Thanks for your attention on FLIP-85! I answered your question inline. >> >> 1. What exactly the job submission interface will look like after this >> FLIP? The FLIP template has a Public Interface section but was removed from >> this FLIP. >> >> As Yang mentioned in this thread above: >> >> From user perspective, only a `-R/-- remote-deploy` cli option is >> visible. They are not aware of the application mode. >> >> 2. How will the new ClusterEntrypoint fetch the jars from external >> storage? What external storage will be supported out of the box? Will this >> "jar fetcher" be pluggable? If so, how does the API look like and how will >> users specify the custom "jar fetcher"? >> >> It depends actually. Here are several points: >> >> i. Currently, shipping user files is handled by Flink, dependencies >> fetching can be handled by Flink. >> ii. Current, we only support local file system shipfiles. When in >> Application Mode, to support meaningful jar fetch we should support user to >> configure richer shipfiles schema at first. >> iii. Dependencies fetching varies from deployments. That is, on YARN, its >> convention is through HDFS; on Kubernetes, its convention is configured >> resource server and fetched by initContainer. >> >> Thus, in the First phase of Application Mode dependencies fetching is >> totally handled within Flink. >> >> 3. It sounds that in this FLIP, the "session cluster" running the >> application has the same lifecycle as the user application. How will the >> session cluster be teared down after the application finishes? Will the >> ClusterEntrypoint do that? Will there be an option of not tearing the >> cluster down? >> >> The precondition we tear down the cluster is *both* >> >> i. user main reached to its end >> ii. all jobs submitted(current, at most one) reached global terminate >> state >> >> For the "how", it is an implementation topic, but conceptually it is >> ClusterEntrypoint's responsibility. >> >> >Will there be an option of not tearing the cluster down? >> >> I think the answer is "No" because the cluster is designed to be bounded >> with an Application. User logic that communicates with the job is always in >> its `main`, and for history information we have history server. >> >> Best, >> tison. >> >> >> Becket Qin <becket....@gmail.com> 于2020年3月9日周一 上午8:12写道: >> >>> Hi Peter and Kostas, >>> >>> Thanks for creating this FLIP. Moving the JobGraph compilation to the >>> cluster makes a lot of sense to me. FLIP-40 had the exactly same idea, but >>> is currently dormant and can probably be superseded by this FLIP. After >>> reading the FLIP, I still have a few questions. >>> >>> 1. What exactly the job submission interface will look like after this >>> FLIP? The FLIP template has a Public Interface section but was removed from >>> this FLIP. >>> 2. How will the new ClusterEntrypoint fetch the jars from external >>> storage? What external storage will be supported out of the box? Will this >>> "jar fetcher" be pluggable? If so, how does the API look like and how will >>> users specify the custom "jar fetcher"? >>> 3. It sounds that in this FLIP, the "session cluster" running the >>> application has the same lifecycle as the user application. How will the >>> session cluster be teared down after the application finishes? Will the >>> ClusterEntrypoint do that? Will there be an option of not tearing the >>> cluster down? >>> >>> Maybe they have been discussed in the ML earlier, but I think they >>> should be part of the FLIP also. >>> >>> Thanks, >>> >>> Jiangjie (Becket) Qin >>> >>> On Thu, Mar 5, 2020 at 10:09 PM Kostas Kloudas <kklou...@gmail.com> >>> wrote: >>> >>>> Also from my side +1 to start voting. >>>> >>>> Cheers, >>>> Kostas >>>> >>>> On Thu, Mar 5, 2020 at 7:45 AM tison <wander4...@gmail.com> wrote: >>>> > >>>> > +1 to star voting. >>>> > >>>> > Best, >>>> > tison. >>>> > >>>> > >>>> > Yang Wang <danrtsey...@gmail.com> 于2020年3月5日周四 下午2:29写道: >>>> >> >>>> >> Hi Peter, >>>> >> Really thanks for your response. >>>> >> >>>> >> Hi all @Kostas Kloudas @Zili Chen @Peter Huang @Rong Rong >>>> >> It seems that we have reached an agreement. The “application mode” >>>> is regarded as the enhanced “per-job”. It is >>>> >> orthogonal with “cluster deploy”. Currently, we bind the “per-job” >>>> to `run-user-main-on-client` and “application mode” >>>> >> to `run-user-main-on-cluster`. >>>> >> >>>> >> Do you have other concerns to moving FLIP-85 to voting? >>>> >> >>>> >> >>>> >> Best, >>>> >> Yang >>>> >> >>>> >> Peter Huang <huangzhenqiu0...@gmail.com> 于2020年3月5日周四 下午12:48写道: >>>> >>> >>>> >>> Hi Yang and Kostas, >>>> >>> >>>> >>> Thanks for the clarification. It makes more sense to me if the long >>>> term goal is to replace per job mode to application mode >>>> >>> in the future (at the time that multiple execute can be >>>> supported). Before that, It will be better to keep the concept of >>>> >>> application mode internally. As Yang suggested, User only need to >>>> use a `-R/-- remote-deploy` cli option to launch >>>> >>> a per job cluster with the main function executed in cluster >>>> entry-point. +1 for the execution plan. >>>> >>> >>>> >>> >>>> >>> >>>> >>> Best Regards >>>> >>> Peter Huang >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Tue, Mar 3, 2020 at 7:11 AM Yang Wang <danrtsey...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> Hi Peter, >>>> >>>> >>>> >>>> Having the application mode does not mean we will drop the >>>> cluster-deploy >>>> >>>> option. I just want to share some thoughts about “Application >>>> Mode”. >>>> >>>> >>>> >>>> >>>> >>>> 1. The application mode could cover the per-job sematic. Its >>>> lifecyle is bound >>>> >>>> to the user `main()`. And all the jobs in the user main will be >>>> executed in a same >>>> >>>> Flink cluster. In first phase of FLIP-85 implementation, running >>>> user main on the >>>> >>>> cluster side could be supported in application mode. >>>> >>>> >>>> >>>> 2. Maybe in the future, we also need to support multiple >>>> `execute()` on client side >>>> >>>> in a same Flink cluster. Then the per-job mode will evolve to >>>> application mode. >>>> >>>> >>>> >>>> 3. From user perspective, only a `-R/-- remote-deploy` cli option >>>> is visible. They >>>> >>>> are not aware of the application mode. >>>> >>>> >>>> >>>> 4. In the first phase, the application mode is working as >>>> “per-job”(only one job in >>>> >>>> the user main). We just leave more potential for the future. >>>> >>>> >>>> >>>> >>>> >>>> I am not against with calling it “cluster deploy mode” if you all >>>> think it is clearer for users. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Best, >>>> >>>> Yang >>>> >>>> >>>> >>>> Kostas Kloudas <kklou...@gmail.com> 于2020年3月3日周二 下午6:49写道: >>>> >>>>> >>>> >>>>> Hi Peter, >>>> >>>>> >>>> >>>>> I understand your point. This is why I was also a bit torn about >>>> the >>>> >>>>> name and my proposal was a bit aligned with yours (something >>>> along the >>>> >>>>> lines of "cluster deploy" mode). >>>> >>>>> >>>> >>>>> But many of the other participants in the discussion suggested the >>>> >>>>> "Application Mode". I think that the reasoning is that now the >>>> user's >>>> >>>>> Application is more self-contained. >>>> >>>>> It will be submitted to the cluster and the user can just >>>> disconnect. >>>> >>>>> In addition, as discussed briefly in the doc, in the future there >>>> may >>>> >>>>> be better support for multi-execute applications which will bring >>>> us >>>> >>>>> one step closer to the true "Application Mode". But this is how I >>>> >>>>> interpreted their arguments, of course they can also express their >>>> >>>>> thoughts on the topic :) >>>> >>>>> >>>> >>>>> Cheers, >>>> >>>>> Kostas >>>> >>>>> >>>> >>>>> On Mon, Mar 2, 2020 at 6:15 PM Peter Huang < >>>> huangzhenqiu0...@gmail.com> wrote: >>>> >>>>> > >>>> >>>>> > Hi Kostas, >>>> >>>>> > >>>> >>>>> > Thanks for updating the wiki. We have aligned with the >>>> implementations in the doc. But I feel it is still a little bit confusing >>>> of the naming from a user's perspective. It is well known that Flink >>>> support per job cluster and session cluster. The concept is in the layer of >>>> how a job is managed within Flink. The method introduced util now is a kind >>>> of mixing job and session cluster to promising the implementation >>>> complexity. We probably don't need to label it as Application Model as the >>>> same layer of per job cluster and session cluster. Conceptually, I think it >>>> is still a cluster mode implementation for per job cluster. >>>> >>>>> > >>>> >>>>> > To minimize the confusion of users, I think it would be better >>>> just an option of per job cluster for each type of cluster manager. How do >>>> you think? >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > Best Regards >>>> >>>>> > Peter Huang >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas < >>>> kklou...@gmail.com> wrote: >>>> >>>>> >> >>>> >>>>> >> Hi Yang, >>>> >>>>> >> >>>> >>>>> >> The difference between per-job and application mode is that, >>>> as you >>>> >>>>> >> described, in the per-job mode the main is executed on the >>>> client >>>> >>>>> >> while in the application mode, the main is executed on the >>>> cluster. >>>> >>>>> >> I do not think we have to offer "application mode" with >>>> running the >>>> >>>>> >> main on the client side as this is exactly what the per-job >>>> mode does >>>> >>>>> >> currently and, as you described also, it would be redundant. >>>> >>>>> >> >>>> >>>>> >> Sorry if this was not clear in the document. >>>> >>>>> >> >>>> >>>>> >> Cheers, >>>> >>>>> >> Kostas >>>> >>>>> >> >>>> >>>>> >> On Mon, Mar 2, 2020 at 3:17 PM Yang Wang < >>>> danrtsey...@gmail.com> wrote: >>>> >>>>> >> > >>>> >>>>> >> > Hi Kostas, >>>> >>>>> >> > >>>> >>>>> >> > Thanks a lot for your conclusion and updating the FLIP-85 >>>> WIKI. Currently, i have no more >>>> >>>>> >> > questions about motivation, approach, fault tolerance and >>>> the first phase implementation. >>>> >>>>> >> > >>>> >>>>> >> > I think the new title "Flink Application Mode" makes a lot >>>> senses to me. Especially for the >>>> >>>>> >> > containerized environment, the cluster deploy option will be >>>> very useful. >>>> >>>>> >> > >>>> >>>>> >> > Just one concern, how do we introduce this new application >>>> mode to our users? >>>> >>>>> >> > Each user program(i.e. `main()`) is an application. >>>> Currently, we intend to only support one >>>> >>>>> >> > `execute()`. So what's the difference between per-job and >>>> application mode? >>>> >>>>> >> > >>>> >>>>> >> > For per-job, user `main()` is always executed on client >>>> side. And For application mode, user >>>> >>>>> >> > `main()` could be executed on client or master >>>> side(configured via cli option). >>>> >>>>> >> > Right? We need to have a clear concept. Otherwise, the users >>>> will be more and more confusing. >>>> >>>>> >> > >>>> >>>>> >> > >>>> >>>>> >> > Best, >>>> >>>>> >> > Yang >>>> >>>>> >> > >>>> >>>>> >> > Kostas Kloudas <kklou...@gmail.com> 于2020年3月2日周一 下午5:58写道: >>>> >>>>> >> >> >>>> >>>>> >> >> Hi all, >>>> >>>>> >> >> >>>> >>>>> >> >> I update >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode >>>> >>>>> >> >> based on the discussion we had here: >>>> >>>>> >> >> >>>> >>>>> >> >> >>>> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit# >>>> >>>>> >> >> >>>> >>>>> >> >> Please let me know what you think and please keep the >>>> discussion in the ML :) >>>> >>>>> >> >> >>>> >>>>> >> >> Thanks for starting the discussion and I hope that soon we >>>> will be >>>> >>>>> >> >> able to vote on the FLIP. >>>> >>>>> >> >> >>>> >>>>> >> >> Cheers, >>>> >>>>> >> >> Kostas >>>> >>>>> >> >> >>>> >>>>> >> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang < >>>> danrtsey...@gmail.com> wrote: >>>> >>>>> >> >> > >>>> >>>>> >> >> > Hi all, >>>> >>>>> >> >> > >>>> >>>>> >> >> > Thanks a lot for the feedback from @Kostas Kloudas. Your >>>> all concerns are >>>> >>>>> >> >> > on point. The FLIP-85 is mainly >>>> >>>>> >> >> > focused on supporting cluster mode for per-job. Since it >>>> is more urgent and >>>> >>>>> >> >> > have much more use >>>> >>>>> >> >> > cases both in Yarn and Kubernetes deployment. For session >>>> cluster, we could >>>> >>>>> >> >> > have more discussion >>>> >>>>> >> >> > in a new thread later. >>>> >>>>> >> >> > >>>> >>>>> >> >> > #1, How to download the user jars and dependencies for >>>> per-job in cluster >>>> >>>>> >> >> > mode? >>>> >>>>> >> >> > For Yarn, we could register the user jars and >>>> dependencies as >>>> >>>>> >> >> > LocalResource. They will be distributed >>>> >>>>> >> >> > by Yarn. And once the JobManager and TaskManager >>>> launched, the jars are >>>> >>>>> >> >> > already exists. >>>> >>>>> >> >> > For Standalone per-job and K8s, we expect that the user >>>> jars >>>> >>>>> >> >> > and dependencies are built into the image. >>>> >>>>> >> >> > Or the InitContainer could be used for downloading. It is >>>> natively >>>> >>>>> >> >> > distributed and we will not have bottleneck. >>>> >>>>> >> >> > >>>> >>>>> >> >> > #2, Job graph recovery >>>> >>>>> >> >> > We could have an optimization to store job graph on the >>>> DFS. However, i >>>> >>>>> >> >> > suggest building a new jobgraph >>>> >>>>> >> >> > from the configuration is the default option. Since we >>>> will not always have >>>> >>>>> >> >> > a DFS store when deploying a >>>> >>>>> >> >> > Flink per-job cluster. Of course, we assume that using >>>> the same >>>> >>>>> >> >> > configuration(e.g. job_id, user_jar, main_class, >>>> >>>>> >> >> > main_args, parallelism, savepoint_settings, etc.) will >>>> get a same job >>>> >>>>> >> >> > graph. I think the standalone per-job >>>> >>>>> >> >> > already has the similar behavior. >>>> >>>>> >> >> > >>>> >>>>> >> >> > #3, What happens with jobs that have multiple execute >>>> calls? >>>> >>>>> >> >> > Currently, it is really a problem. Even we use a local >>>> client on Flink >>>> >>>>> >> >> > master side, it will have different behavior with >>>> >>>>> >> >> > client mode. For client mode, if we execute multiple >>>> times, then we will >>>> >>>>> >> >> > deploy multiple Flink clusters for each execute. >>>> >>>>> >> >> > I am not pretty sure whether it is reasonable. However, i >>>> still think using >>>> >>>>> >> >> > the local client is a good choice. We could >>>> >>>>> >> >> > continue the discussion in a new thread. @Zili Chen < >>>> wander4...@gmail.com> Do >>>> >>>>> >> >> > you want to drive this? >>>> >>>>> >> >> > >>>> >>>>> >> >> > >>>> >>>>> >> >> > >>>> >>>>> >> >> > Best, >>>> >>>>> >> >> > Yang >>>> >>>>> >> >> > >>>> >>>>> >> >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月16日周四 >>>> 上午1:55写道: >>>> >>>>> >> >> > >>>> >>>>> >> >> > > Hi Kostas, >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > Thanks for this feedback. I can't agree more about the >>>> opinion. The >>>> >>>>> >> >> > > cluster mode should be added >>>> >>>>> >> >> > > first in per job cluster. >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > 1) For job cluster implementation >>>> >>>>> >> >> > > 1. Job graph recovery from configuration or store as >>>> static job graph as >>>> >>>>> >> >> > > session cluster. I think the static one will be better >>>> for less recovery >>>> >>>>> >> >> > > time. >>>> >>>>> >> >> > > Let me update the doc for details. >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > 2. For job execute multiple times, I think @Zili Chen >>>> >>>>> >> >> > > <wander4...@gmail.com> has proposed the local client >>>> solution that can >>>> >>>>> >> >> > > the run program actually in the cluster entry point. We >>>> can put the >>>> >>>>> >> >> > > implementation in the second stage, >>>> >>>>> >> >> > > or even a new FLIP for further discussion. >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > 2) For session cluster implementation >>>> >>>>> >> >> > > We can disable the cluster mode for the session cluster >>>> in the first >>>> >>>>> >> >> > > stage. I agree the jar downloading will be a painful >>>> thing. >>>> >>>>> >> >> > > We can consider about PoC and performance evaluation >>>> first. If the end to >>>> >>>>> >> >> > > end experience is good enough, then we can consider >>>> >>>>> >> >> > > proceeding with the solution. >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > Looking forward to more opinions from @Yang Wang < >>>> danrtsey...@gmail.com> @Zili >>>> >>>>> >> >> > > Chen <wander4...@gmail.com> @Dian Fu < >>>> dian0511...@gmail.com>. >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > Best Regards >>>> >>>>> >> >> > > Peter Huang >>>> >>>>> >> >> > > >>>> >>>>> >> >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas < >>>> kklou...@gmail.com> wrote: >>>> >>>>> >> >> > > >>>> >>>>> >> >> > >> Hi all, >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > >> I am writing here as the discussion on the Google Doc >>>> seems to be a >>>> >>>>> >> >> > >> bit difficult to follow. >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > >> I think that in order to be able to make progress, it >>>> would be helpful >>>> >>>>> >> >> > >> to focus on per-job mode for now. >>>> >>>>> >> >> > >> The reason is that: >>>> >>>>> >> >> > >> 1) making the (unique) JobSubmitHandler responsible >>>> for creating the >>>> >>>>> >> >> > >> jobgraphs, >>>> >>>>> >> >> > >> which includes downloading dependencies, is not an >>>> optimal solution >>>> >>>>> >> >> > >> 2) even if we put the responsibility on the >>>> JobMaster, currently each >>>> >>>>> >> >> > >> job has its own >>>> >>>>> >> >> > >> JobMaster but they all run on the same process, so >>>> we have again a >>>> >>>>> >> >> > >> single entity. >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > >> Of course after this is done, and if we feel >>>> comfortable with the >>>> >>>>> >> >> > >> solution, then we can go to the session mode. >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > >> A second comment has to do with fault-tolerance in the >>>> per-job, >>>> >>>>> >> >> > >> cluster-deploy mode. >>>> >>>>> >> >> > >> In the document, it is suggested that upon recovery, >>>> the JobMaster of >>>> >>>>> >> >> > >> each job re-creates the JobGraph. >>>> >>>>> >> >> > >> I am just wondering if it is better to create and >>>> store the jobGraph >>>> >>>>> >> >> > >> upon submission and only fetch it >>>> >>>>> >> >> > >> upon recovery so that we have a static jobGraph. >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > >> Finally, I have a question which is what happens with >>>> jobs that have >>>> >>>>> >> >> > >> multiple execute calls? >>>> >>>>> >> >> > >> The semantics seem to change compared to the current >>>> behaviour, right? >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > >> Cheers, >>>> >>>>> >> >> > >> Kostas >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison < >>>> wander4...@gmail.com> wrote: >>>> >>>>> >> >> > >> > >>>> >>>>> >> >> > >> > not always, Yang Wang is also not yet a committer >>>> but he can join the >>>> >>>>> >> >> > >> > channel. I cannot find the id by clicking “Add new >>>> member in channel” so >>>> >>>>> >> >> > >> > come to you and ask for try out the link. Possibly I >>>> will find other >>>> >>>>> >> >> > >> ways >>>> >>>>> >> >> > >> > but the original purpose is that the slack channel >>>> is a public area we >>>> >>>>> >> >> > >> > discuss about developing... >>>> >>>>> >> >> > >> > Best, >>>> >>>>> >> >> > >> > tison. >>>> >>>>> >> >> > >> > >>>> >>>>> >> >> > >> > >>>> >>>>> >> >> > >> > Peter Huang <huangzhenqiu0...@gmail.com> >>>> 于2020年1月9日周四 上午2:44写道: >>>> >>>>> >> >> > >> > >>>> >>>>> >> >> > >> > > Hi Tison, >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> > > I am not the committer of Flink yet. I think I >>>> can't join it also. >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> > > Best Regards >>>> >>>>> >> >> > >> > > Peter Huang >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison < >>>> wander4...@gmail.com> wrote: >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> > > > Hi Peter, >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > > Could you try out this link? >>>> >>>>> >> >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > > Best, >>>> >>>>> >> >> > >> > > > tison. >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>> 于2020年1月9日周四 上午1:22写道: >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > > > Hi Tison, >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > > I can't join the group with shared link. Would >>>> you please add me >>>> >>>>> >> >> > >> into >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > group? My slack account is huangzhenqiu0825. >>>> >>>>> >> >> > >> > > > > Thank you in advance. >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > > Best Regards >>>> >>>>> >> >> > >> > > > > Peter Huang >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison < >>>> wander4...@gmail.com> >>>> >>>>> >> >> > >> wrote: >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > > > Hi Peter, >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > As described above, this effort should get >>>> attention from people >>>> >>>>> >> >> > >> > > > > developing >>>> >>>>> >> >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I >>>> recommend you to join >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > > public >>>> >>>>> >> >> > >> > > > > > slack channel[1] for Flink Client API >>>> Enhancement and you can >>>> >>>>> >> >> > >> try to >>>> >>>>> >> >> > >> > > > > share >>>> >>>>> >> >> > >> > > > > > you detailed thoughts there. It possibly >>>> gets more concrete >>>> >>>>> >> >> > >> > > attentions. >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > Best, >>>> >>>>> >> >> > >> > > > > > tison. >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > [1] >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> >>>> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>> 于2020年1月7日周二 上午5:09写道: >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > > Dear All, >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > Happy new year! According to existing >>>> feedback from the >>>> >>>>> >> >> > >> community, >>>> >>>>> >> >> > >> > > we >>>> >>>>> >> >> > >> > > > > > > revised the doc with the consideration of >>>> session cluster >>>> >>>>> >> >> > >> support, >>>> >>>>> >> >> > >> > > > and >>>> >>>>> >> >> > >> > > > > > > concrete interface changes needed and >>>> execution plan. Please >>>> >>>>> >> >> > >> take >>>> >>>>> >> >> > >> > > one >>>> >>>>> >> >> > >> > > > > > more >>>> >>>>> >> >> > >> > > > > > > round of review at your most convenient >>>> time. >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> >>>> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit# >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > Best Regards >>>> >>>>> >> >> > >> > > > > > > Peter Huang >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter >>>> Huang < >>>> >>>>> >> >> > >> > > > > huangzhenqiu0...@gmail.com> >>>> >>>>> >> >> > >> > > > > > > wrote: >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > > > Hi Dian, >>>> >>>>> >> >> > >> > > > > > > > Thanks for giving us valuable feedbacks. >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > 1) It's better to have a whole design >>>> for this feature >>>> >>>>> >> >> > >> > > > > > > > For the suggestion of enabling the >>>> cluster mode also session >>>> >>>>> >> >> > >> > > > > cluster, I >>>> >>>>> >> >> > >> > > > > > > > think Flink already supported it. >>>> WebSubmissionExtension >>>> >>>>> >> >> > >> already >>>> >>>>> >> >> > >> > > > > allows >>>> >>>>> >> >> > >> > > > > > > > users to start a job with the specified >>>> jar by using web UI. >>>> >>>>> >> >> > >> > > > > > > > But we need to enable the feature from >>>> CLI for both local >>>> >>>>> >> >> > >> jar, >>>> >>>>> >> >> > >> > > > remote >>>> >>>>> >> >> > >> > > > > > > jar. >>>> >>>>> >> >> > >> > > > > > > > I will align with Yang Wang first about >>>> the details and >>>> >>>>> >> >> > >> update >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > > design >>>> >>>>> >> >> > >> > > > > > > > doc. >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > 2) It's better to consider the >>>> convenience for users, such >>>> >>>>> >> >> > >> as >>>> >>>>> >> >> > >> > > > > debugging >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > I am wondering whether we can store the >>>> exception in >>>> >>>>> >> >> > >> jobgragh >>>> >>>>> >> >> > >> > > > > > > > generation in application master. As no >>>> streaming graph can >>>> >>>>> >> >> > >> be >>>> >>>>> >> >> > >> > > > > > scheduled >>>> >>>>> >> >> > >> > > > > > > in >>>> >>>>> >> >> > >> > > > > > > > this case, there will be no more TM will >>>> be requested from >>>> >>>>> >> >> > >> > > FlinkRM. >>>> >>>>> >> >> > >> > > > > > > > If the AM is still running, users can >>>> still query it from >>>> >>>>> >> >> > >> CLI. As >>>> >>>>> >> >> > >> > > > it >>>> >>>>> >> >> > >> > > > > > > > requires more change, we can get some >>>> feedback from < >>>> >>>>> >> >> > >> > > > > > aljos...@apache.org >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > and @zjf...@gmail.com <zjf...@gmail.com >>>> >. >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > 3) It's better to consider the impact to >>>> the stability of >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > > cluster >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > I agree with Yang Wang's opinion. >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > Best Regards >>>> >>>>> >> >> > >> > > > > > > > Peter Huang >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian Fu < >>>> >>>>> >> >> > >> dian0511...@gmail.com> >>>> >>>>> >> >> > >> > > > > wrote: >>>> >>>>> >> >> > >> > > > > > > > >>>> >>>>> >> >> > >> > > > > > > >> Hi all, >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> Sorry to jump into this discussion. >>>> Thanks everyone for the >>>> >>>>> >> >> > >> > > > > > discussion. >>>> >>>>> >> >> > >> > > > > > > >> I'm very interested in this topic >>>> although I'm not an >>>> >>>>> >> >> > >> expert in >>>> >>>>> >> >> > >> > > > this >>>> >>>>> >> >> > >> > > > > > > part. >>>> >>>>> >> >> > >> > > > > > > >> So I'm glad to share my thoughts as >>>> following: >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> 1) It's better to have a whole design >>>> for this feature >>>> >>>>> >> >> > >> > > > > > > >> As we know, there are two deployment >>>> modes: per-job mode >>>> >>>>> >> >> > >> and >>>> >>>>> >> >> > >> > > > session >>>> >>>>> >> >> > >> > > > > > > >> mode. I'm wondering which mode really >>>> needs this feature. >>>> >>>>> >> >> > >> As the >>>> >>>>> >> >> > >> > > > > > design >>>> >>>>> >> >> > >> > > > > > > doc >>>> >>>>> >> >> > >> > > > > > > >> mentioned, per-job mode is more used >>>> for streaming jobs and >>>> >>>>> >> >> > >> > > > session >>>> >>>>> >> >> > >> > > > > > > mode is >>>> >>>>> >> >> > >> > > > > > > >> usually used for batch jobs(Of course, >>>> the job types and >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > > > > deployment >>>> >>>>> >> >> > >> > > > > > > >> modes are orthogonal). Usually >>>> streaming job is only >>>> >>>>> >> >> > >> needed to >>>> >>>>> >> >> > >> > > be >>>> >>>>> >> >> > >> > > > > > > submitted >>>> >>>>> >> >> > >> > > > > > > >> once and it will run for days or weeks, >>>> while batch jobs >>>> >>>>> >> >> > >> will be >>>> >>>>> >> >> > >> > > > > > > submitted >>>> >>>>> >> >> > >> > > > > > > >> more frequently compared with streaming >>>> jobs. This means >>>> >>>>> >> >> > >> that >>>> >>>>> >> >> > >> > > > maybe >>>> >>>>> >> >> > >> > > > > > > session >>>> >>>>> >> >> > >> > > > > > > >> mode also needs this feature. However, >>>> if we support this >>>> >>>>> >> >> > >> > > feature >>>> >>>>> >> >> > >> > > > in >>>> >>>>> >> >> > >> > > > > > > >> session mode, the application master >>>> will become the new >>>> >>>>> >> >> > >> > > > centralized >>>> >>>>> >> >> > >> > > > > > > >> service(which should be solved). So in >>>> this case, it's >>>> >>>>> >> >> > >> better to >>>> >>>>> >> >> > >> > > > > have >>>> >>>>> >> >> > >> > > > > > a >>>> >>>>> >> >> > >> > > > > > > >> complete design for both per-job mode >>>> and session mode. >>>> >>>>> >> >> > >> > > > Furthermore, >>>> >>>>> >> >> > >> > > > > > > even >>>> >>>>> >> >> > >> > > > > > > >> if we can do it phase by phase, we need >>>> to have a whole >>>> >>>>> >> >> > >> picture >>>> >>>>> >> >> > >> > > of >>>> >>>>> >> >> > >> > > > > how >>>> >>>>> >> >> > >> > > > > > > it >>>> >>>>> >> >> > >> > > > > > > >> works in both per-job mode and session >>>> mode. >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> 2) It's better to consider the >>>> convenience for users, such >>>> >>>>> >> >> > >> as >>>> >>>>> >> >> > >> > > > > > debugging >>>> >>>>> >> >> > >> > > > > > > >> After we finish this feature, the job >>>> graph will be >>>> >>>>> >> >> > >> compiled in >>>> >>>>> >> >> > >> > > > the >>>> >>>>> >> >> > >> > > > > > > >> application master, which means that >>>> users cannot easily >>>> >>>>> >> >> > >> get the >>>> >>>>> >> >> > >> > > > > > > exception >>>> >>>>> >> >> > >> > > > > > > >> message synchorousely in the job client >>>> if there are >>>> >>>>> >> >> > >> problems >>>> >>>>> >> >> > >> > > > during >>>> >>>>> >> >> > >> > > > > > the >>>> >>>>> >> >> > >> > > > > > > >> job graph compiling (especially for >>>> platform users), such >>>> >>>>> >> >> > >> as the >>>> >>>>> >> >> > >> > > > > > > resource >>>> >>>>> >> >> > >> > > > > > > >> path is incorrect, the user program >>>> itself has some >>>> >>>>> >> >> > >> problems, >>>> >>>>> >> >> > >> > > etc. >>>> >>>>> >> >> > >> > > > > > What >>>> >>>>> >> >> > >> > > > > > > I'm >>>> >>>>> >> >> > >> > > > > > > >> thinking is that maybe we should throw >>>> the exceptions as >>>> >>>>> >> >> > >> early >>>> >>>>> >> >> > >> > > as >>>> >>>>> >> >> > >> > > > > > > possible >>>> >>>>> >> >> > >> > > > > > > >> (during job submission stage). >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> 3) It's better to consider the impact >>>> to the stability of >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > > > cluster >>>> >>>>> >> >> > >> > > > > > > >> If we perform the compiling in the >>>> application master, we >>>> >>>>> >> >> > >> should >>>> >>>>> >> >> > >> > > > > > > consider >>>> >>>>> >> >> > >> > > > > > > >> the impact of the compiling errors. >>>> Although YARN could >>>> >>>>> >> >> > >> resume >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > > > >> application master in case of failures, >>>> but in some case >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > > > compiling >>>> >>>>> >> >> > >> > > > > > > >> failure may be a waste of cluster >>>> resource and may impact >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > > > > stability >>>> >>>>> >> >> > >> > > > > > > the >>>> >>>>> >> >> > >> > > > > > > >> cluster and the other jobs in the >>>> cluster, such as the >>>> >>>>> >> >> > >> resource >>>> >>>>> >> >> > >> > > > path >>>> >>>>> >> >> > >> > > > > > is >>>> >>>>> >> >> > >> > > > > > > >> incorrect, the user program itself has >>>> some problems(in >>>> >>>>> >> >> > >> this >>>> >>>>> >> >> > >> > > case, >>>> >>>>> >> >> > >> > > > > job >>>> >>>>> >> >> > >> > > > > > > >> failover cannot solve this kind of >>>> problems) etc. In the >>>> >>>>> >> >> > >> current >>>> >>>>> >> >> > >> > > > > > > >> implemention, the compiling errors are >>>> handled in the >>>> >>>>> >> >> > >> client >>>> >>>>> >> >> > >> > > side >>>> >>>>> >> >> > >> > > > > and >>>> >>>>> >> >> > >> > > > > > > there >>>> >>>>> >> >> > >> > > > > > > >> is no impact to the cluster at all. >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed >>>> in the design doc >>>> >>>>> >> >> > >> that >>>> >>>>> >> >> > >> > > only >>>> >>>>> >> >> > >> > > > > > > per-job >>>> >>>>> >> >> > >> > > > > > > >> mode will be supported. However, I >>>> think it's better to >>>> >>>>> >> >> > >> also >>>> >>>>> >> >> > >> > > > > consider >>>> >>>>> >> >> > >> > > > > > > the >>>> >>>>> >> >> > >> > > > > > > >> session mode in the design doc. >>>> >>>>> >> >> > >> > > > > > > >> Regarding to 2) and 3), I have not seen >>>> related sections >>>> >>>>> >> >> > >> in the >>>> >>>>> >> >> > >> > > > > design >>>> >>>>> >> >> > >> > > > > > > >> doc. It will be good if we can cover >>>> them in the design >>>> >>>>> >> >> > >> doc. >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> Feel free to correct me If there is >>>> anything I >>>> >>>>> >> >> > >> misunderstand. >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> Regards, >>>> >>>>> >> >> > >> > > > > > > >> Dian >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> > 在 2019年12月27日,上午3:13,Peter Huang < >>>> >>>>> >> >> > >> huangzhenqiu0...@gmail.com> >>>> >>>>> >> >> > >> > > > 写道: >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > Hi Yang, >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > I can't agree more. The effort >>>> definitely needs to align >>>> >>>>> >> >> > >> with >>>> >>>>> >> >> > >> > > > the >>>> >>>>> >> >> > >> > > > > > > final >>>> >>>>> >> >> > >> > > > > > > >> > goal of FLIP-73. >>>> >>>>> >> >> > >> > > > > > > >> > I am thinking about whether we can >>>> achieve the goal with >>>> >>>>> >> >> > >> two >>>> >>>>> >> >> > >> > > > > phases. >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > 1) Phase I >>>> >>>>> >> >> > >> > > > > > > >> > As the CLiFrontend will not be >>>> depreciated soon. We can >>>> >>>>> >> >> > >> still >>>> >>>>> >> >> > >> > > > use >>>> >>>>> >> >> > >> > > > > > the >>>> >>>>> >> >> > >> > > > > > > >> > deployMode flag there, >>>> >>>>> >> >> > >> > > > > > > >> > pass the program info through Flink >>>> configuration, use >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > > > > > >> > ClassPathJobGraphRetriever >>>> >>>>> >> >> > >> > > > > > > >> > to generate the job graph in >>>> ClusterEntrypoints of yarn >>>> >>>>> >> >> > >> and >>>> >>>>> >> >> > >> > > > > > > Kubernetes. >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > 2) Phase II >>>> >>>>> >> >> > >> > > > > > > >> > In AbstractJobClusterExecutor, the >>>> job graph is >>>> >>>>> >> >> > >> generated in >>>> >>>>> >> >> > >> > > > the >>>> >>>>> >> >> > >> > > > > > > >> execute >>>> >>>>> >> >> > >> > > > > > > >> > function. We can still >>>> >>>>> >> >> > >> > > > > > > >> > use the deployMode in it. With >>>> deployMode = cluster, the >>>> >>>>> >> >> > >> > > execute >>>> >>>>> >> >> > >> > > > > > > >> function >>>> >>>>> >> >> > >> > > > > > > >> > only starts the cluster. >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > When >>>> {Yarn/Kuberneates}PerJobClusterEntrypoint starts, >>>> >>>>> >> >> > >> It will >>>> >>>>> >> >> > >> > > > > start >>>> >>>>> >> >> > >> > > > > > > the >>>> >>>>> >> >> > >> > > > > > > >> > dispatch first, then we can use >>>> >>>>> >> >> > >> > > > > > > >> > a ClusterEnvironment similar to >>>> ContextEnvironment to >>>> >>>>> >> >> > >> submit >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > job >>>> >>>>> >> >> > >> > > > > > > >> with >>>> >>>>> >> >> > >> > > > > > > >> > jobName the local >>>> >>>>> >> >> > >> > > > > > > >> > dispatcher. For the details, we need >>>> more investigation. >>>> >>>>> >> >> > >> Let's >>>> >>>>> >> >> > >> > > > > wait >>>> >>>>> >> >> > >> > > > > > > >> > for @Aljoscha >>>> >>>>> >> >> > >> > > > > > > >> > Krettek <aljos...@apache.org> @Till >>>> Rohrmann < >>>> >>>>> >> >> > >> > > > > trohrm...@apache.org >>>> >>>>> >> >> > >> > > > > > >'s >>>> >>>>> >> >> > >> > > > > > > >> > feedback after the holiday season. >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > Thank you in advance. Merry Chrismas >>>> and Happy New >>>> >>>>> >> >> > >> Year!!! >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > Best Regards >>>> >>>>> >> >> > >> > > > > > > >> > Peter Huang >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM Yang >>>> Wang < >>>> >>>>> >> >> > >> > > > danrtsey...@gmail.com> >>>> >>>>> >> >> > >> > > > > > > >> wrote: >>>> >>>>> >> >> > >> > > > > > > >> > >>>> >>>>> >> >> > >> > > > > > > >> >> Hi Peter, >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >> I think we need to reconsider >>>> tison's suggestion >>>> >>>>> >> >> > >> seriously. >>>> >>>>> >> >> > >> > > > After >>>> >>>>> >> >> > >> > > > > > > >> FLIP-73, >>>> >>>>> >> >> > >> > > > > > > >> >> the deployJobCluster has >>>> >>>>> >> >> > >> > > > > > > >> >> beenmoved into >>>> `JobClusterExecutor#execute`. It should >>>> >>>>> >> >> > >> not be >>>> >>>>> >> >> > >> > > > > > > perceived >>>> >>>>> >> >> > >> > > > > > > >> >> for `CliFrontend`. That >>>> >>>>> >> >> > >> > > > > > > >> >> means the user program will *ALWAYS* >>>> be executed on >>>> >>>>> >> >> > >> client >>>> >>>>> >> >> > >> > > > side. >>>> >>>>> >> >> > >> > > > > > This >>>> >>>>> >> >> > >> > > > > > > >> is >>>> >>>>> >> >> > >> > > > > > > >> >> the by design behavior. >>>> >>>>> >> >> > >> > > > > > > >> >> So, we could not just add `if(client >>>> mode) .. else >>>> >>>>> >> >> > >> if(cluster >>>> >>>>> >> >> > >> > > > > mode) >>>> >>>>> >> >> > >> > > > > > > >> ...` >>>> >>>>> >> >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass >>>> >>>>> >> >> > >> > > > > > > >> >> the executor. We need to find a >>>> clean way to decouple >>>> >>>>> >> >> > >> > > executing >>>> >>>>> >> >> > >> > > > > > user >>>> >>>>> >> >> > >> > > > > > > >> >> program and deploying per-job >>>> >>>>> >> >> > >> > > > > > > >> >> cluster. Based on this, we could >>>> support to execute user >>>> >>>>> >> >> > >> > > > program >>>> >>>>> >> >> > >> > > > > on >>>> >>>>> >> >> > >> > > > > > > >> client >>>> >>>>> >> >> > >> > > > > > > >> >> or master side. >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give >>>> some good >>>> >>>>> >> >> > >> suggestions. >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >> Best, >>>> >>>>> >> >> > >> > > > > > > >> >> Yang >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >> Peter Huang < >>>> huangzhenqiu0...@gmail.com> 于2019年12月25日周三 >>>> >>>>> >> >> > >> > > > > 上午4:03写道: >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >>> Hi Jingjing, >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >>> The improvement proposed is a >>>> deployment option for >>>> >>>>> >> >> > >> CLI. For >>>> >>>>> >> >> > >> > > > SQL >>>> >>>>> >> >> > >> > > > > > > based >>>> >>>>> >> >> > >> > > > > > > >> >>> Flink application, It is more >>>> convenient to use the >>>> >>>>> >> >> > >> existing >>>> >>>>> >> >> > >> > > > > model >>>> >>>>> >> >> > >> > > > > > > in >>>> >>>>> >> >> > >> > > > > > > >> >>> SqlClient in which >>>> >>>>> >> >> > >> > > > > > > >> >>> the job graph is generated within >>>> SqlClient. After >>>> >>>>> >> >> > >> adding >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > > > delayed >>>> >>>>> >> >> > >> > > > > > > >> job >>>> >>>>> >> >> > >> > > > > > > >> >>> graph generation, I think there is >>>> no change is needed >>>> >>>>> >> >> > >> for >>>> >>>>> >> >> > >> > > > your >>>> >>>>> >> >> > >> > > > > > > side. >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >>> Best Regards >>>> >>>>> >> >> > >> > > > > > > >> >>> Peter Huang >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM >>>> jingjing bai < >>>> >>>>> >> >> > >> > > > > > > >> baijingjing7...@gmail.com> >>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >>>> hi peter: >>>> >>>>> >> >> > >> > > > > > > >> >>>> we had extension SqlClent to >>>> support sql job >>>> >>>>> >> >> > >> submit in >>>> >>>>> >> >> > >> > > web >>>> >>>>> >> >> > >> > > > > > base >>>> >>>>> >> >> > >> > > > > > > on >>>> >>>>> >> >> > >> > > > > > > >> >>>> flink 1.9. we support submit to >>>> yarn on per job >>>> >>>>> >> >> > >> mode too. >>>> >>>>> >> >> > >> > > > > > > >> >>>> in this case, the job graph >>>> generated on client >>>> >>>>> >> >> > >> side >>>> >>>>> >> >> > >> > > . I >>>> >>>>> >> >> > >> > > > > > think >>>> >>>>> >> >> > >> > > > > > > >> >>> this >>>> >>>>> >> >> > >> > > > > > > >> >>>> discuss Mainly to improve api >>>> programme. but in my >>>> >>>>> >> >> > >> case , >>>> >>>>> >> >> > >> > > > > there >>>> >>>>> >> >> > >> > > > > > is >>>> >>>>> >> >> > >> > > > > > > >> no >>>> >>>>> >> >> > >> > > > > > > >> >>>> jar to upload but only a sql >>>> string . >>>> >>>>> >> >> > >> > > > > > > >> >>>> do u had more suggestion to >>>> improve for sql mode >>>> >>>>> >> >> > >> or it >>>> >>>>> >> >> > >> > > is >>>> >>>>> >> >> > >> > > > > > only a >>>> >>>>> >> >> > >> > > > > > > >> >>>> switch for api programme? >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>> best >>>> >>>>> >> >> > >> > > > > > > >> >>>> bai jj >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>> Yang Wang <danrtsey...@gmail.com> >>>> 于2019年12月18日周三 >>>> >>>>> >> >> > >> 下午7:21写道: >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> I just want to revive this >>>> discussion. >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> Recently, i am thinking about how >>>> to natively run >>>> >>>>> >> >> > >> flink >>>> >>>>> >> >> > >> > > > > per-job >>>> >>>>> >> >> > >> > > > > > > >> >>> cluster on >>>> >>>>> >> >> > >> > > > > > > >> >>>>> Kubernetes. >>>> >>>>> >> >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes is >>>> very different >>>> >>>>> >> >> > >> from on >>>> >>>>> >> >> > >> > > > Yarn. >>>> >>>>> >> >> > >> > > > > > And >>>> >>>>> >> >> > >> > > > > > > >> we >>>> >>>>> >> >> > >> > > > > > > >> >>> will >>>> >>>>> >> >> > >> > > > > > > >> >>>>> have >>>> >>>>> >> >> > >> > > > > > > >> >>>>> the same deployment requirements >>>> to the client and >>>> >>>>> >> >> > >> entry >>>> >>>>> >> >> > >> > > > > point. >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> 1. Flink client not always need a >>>> local jar to start >>>> >>>>> >> >> > >> a >>>> >>>>> >> >> > >> > > Flink >>>> >>>>> >> >> > >> > > > > > > per-job >>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster. We could >>>> >>>>> >> >> > >> > > > > > > >> >>>>> support multiple schemas. For >>>> example, >>>> >>>>> >> >> > >> > > > file:///path/of/my.jar >>>> >>>>> >> >> > >> > > > > > > means >>>> >>>>> >> >> > >> > > > > > > >> a >>>> >>>>> >> >> > >> > > > > > > >> >>> jar >>>> >>>>> >> >> > >> > > > > > > >> >>>>> located >>>> >>>>> >> >> > >> > > > > > > >> >>>>> at client side, >>>> >>>>> >> >> > >> hdfs://myhdfs/user/myname/flink/my.jar >>>> >>>>> >> >> > >> > > > means a >>>> >>>>> >> >> > >> > > > > > jar >>>> >>>>> >> >> > >> > > > > > > >> >>> located >>>> >>>>> >> >> > >> > > > > > > >> >>>>> at >>>> >>>>> >> >> > >> > > > > > > >> >>>>> remote hdfs, >>>> local:///path/in/image/my.jar means a >>>> >>>>> >> >> > >> jar >>>> >>>>> >> >> > >> > > > located >>>> >>>>> >> >> > >> > > > > > at >>>> >>>>> >> >> > >> > > > > > > >> >>>>> jobmanager side. >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> 2. Support running user program >>>> on master side. This >>>> >>>>> >> >> > >> also >>>> >>>>> >> >> > >> > > > > means >>>> >>>>> >> >> > >> > > > > > > the >>>> >>>>> >> >> > >> > > > > > > >> >>> entry >>>> >>>>> >> >> > >> > > > > > > >> >>>>> point >>>> >>>>> >> >> > >> > > > > > > >> >>>>> will generate the job graph on >>>> master side. We could >>>> >>>>> >> >> > >> use >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever >>>> >>>>> >> >> > >> > > > > > > >> >>>>> or start a local Flink client to >>>> achieve this >>>> >>>>> >> >> > >> purpose. >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do >>>> you think this is the >>>> >>>>> >> >> > >> right >>>> >>>>> >> >> > >> > > > > > > >> direction we >>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to work? >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> tison <wander4...@gmail.com> >>>> 于2019年12月12日周四 >>>> >>>>> >> >> > >> 下午4:48写道: >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> A quick idea is that we separate >>>> the deployment >>>> >>>>> >> >> > >> from user >>>> >>>>> >> >> > >> > > > > > program >>>> >>>>> >> >> > >> > > > > > > >> >>> that >>>> >>>>> >> >> > >> > > > > > > >> >>>>> it >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> has always been done >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> outside the program. On user >>>> program executed there >>>> >>>>> >> >> > >> is >>>> >>>>> >> >> > >> > > > > always a >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> ClusterClient that communicates >>>> with >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> an existing cluster, remote or >>>> local. It will be >>>> >>>>> >> >> > >> another >>>> >>>>> >> >> > >> > > > > thread >>>> >>>>> >> >> > >> > > > > > > so >>>> >>>>> >> >> > >> > > > > > > >> >>> just >>>> >>>>> >> >> > >> > > > > > > >> >>>>> for >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> your information. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> Best, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison <wander4...@gmail.com> >>>> 于2019年12月12日周四 >>>> >>>>> >> >> > >> 下午4:40写道: >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Hi Peter, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Another concern I realized >>>> recently is that with >>>> >>>>> >> >> > >> current >>>> >>>>> >> >> > >> > > > > > > Executors >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73) >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> I'm afraid that user program is >>>> designed to ALWAYS >>>> >>>>> >> >> > >> run >>>> >>>>> >> >> > >> > > on >>>> >>>>> >> >> > >> > > > > the >>>> >>>>> >> >> > >> > > > > > > >> >>> client >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> side. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Specifically, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> we deploy the job in executor >>>> when env.execute >>>> >>>>> >> >> > >> called. >>>> >>>>> >> >> > >> > > > This >>>> >>>>> >> >> > >> > > > > > > >> >>>>> abstraction >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> possibly prevents >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Flink runs user program on the >>>> cluster side. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> For your proposal, in this case >>>> we already >>>> >>>>> >> >> > >> compiled the >>>> >>>>> >> >> > >> > > > > > program >>>> >>>>> >> >> > >> > > > > > > >> and >>>> >>>>> >> >> > >> > > > > > > >> >>>>> run >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> on >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> the client side, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and >>>> retrieve job graph >>>> >>>>> >> >> > >> from >>>> >>>>> >> >> > >> > > > program >>>> >>>>> >> >> > >> > > > > > > >> >>>>> metadata, it >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> doesn't make >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> many sense. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do >>>> you think about this >>>> >>>>> >> >> > >> > > > > constraint? >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Best, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> tison. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Peter Huang < >>>> huangzhenqiu0...@gmail.com> >>>> >>>>> >> >> > >> 于2019年12月10日周二 >>>> >>>>> >> >> > >> > > > > > > >> 下午12:45写道: >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Hi Tison, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think I >>>> made the wrong >>>> >>>>> >> >> > >> argument >>>> >>>>> >> >> > >> > > in >>>> >>>>> >> >> > >> > > > > the >>>> >>>>> >> >> > >> > > > > > > doc. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar >>>> problem is only for >>>> >>>>> >> >> > >> > > platform >>>> >>>>> >> >> > >> > > > > > > users. >>>> >>>>> >> >> > >> > > > > > > >> >>> In >>>> >>>>> >> >> > >> > > > > > > >> >>>>> our >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> internal deploy service, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> we further optimized the >>>> deployment latency by >>>> >>>>> >> >> > >> letting >>>> >>>>> >> >> > >> > > > > users >>>> >>>>> >> >> > >> > > > > > to >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> packaging >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> flink-runtime together with >>>> the uber jar, so that >>>> >>>>> >> >> > >> we >>>> >>>>> >> >> > >> > > > don't >>>> >>>>> >> >> > >> > > > > > need >>>> >>>>> >> >> > >> > > > > > > >> to >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> consider >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> multiple flink version >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> support for now. In the >>>> session client mode, as >>>> >>>>> >> >> > >> Flink >>>> >>>>> >> >> > >> > > > libs >>>> >>>>> >> >> > >> > > > > > will >>>> >>>>> >> >> > >> > > > > > > >> be >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> shipped >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> anyway as local resources of >>>> yarn. Users actually >>>> >>>>> >> >> > >> don't >>>> >>>>> >> >> > >> > > > > need >>>> >>>>> >> >> > >> > > > > > to >>>> >>>>> >> >> > >> > > > > > > >> >>>>> package >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> those libs into job jar. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Best Regards >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Peter Huang >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35 PM >>>> tison < >>>> >>>>> >> >> > >> > > > wander4...@gmail.com >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>> the package? Do users >>>> >>>>> >> >> > >> need >>>> >>>>> >> >> > >> > > to >>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients, >>>> flink-optimizer, >>>> >>>>> >> >> > >> flink-table >>>> >>>>> >> >> > >> > > > > codes? >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> The answer should be no >>>> because they exist in >>>> >>>>> >> >> > >> system >>>> >>>>> >> >> > >> > > > > > > classpath. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Best, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> tison. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Yang Wang < >>>> danrtsey...@gmail.com> 于2019年12月10日周二 >>>> >>>>> >> >> > >> > > > > 下午12:18写道: >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting >>>> this discussion. I >>>> >>>>> >> >> > >> think >>>> >>>>> >> >> > >> > > this >>>> >>>>> >> >> > >> > > > > is >>>> >>>>> >> >> > >> > > > > > a >>>> >>>>> >> >> > >> > > > > > > >> >>> very >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> useful >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> feature. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am >>>> focused on flink on >>>> >>>>> >> >> > >> > > Kubernetes >>>> >>>>> >> >> > >> > > > > > > >> >>>>> integration >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> and >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> come >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> across the same >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the >>>> job graph generated >>>> >>>>> >> >> > >> on >>>> >>>>> >> >> > >> > > > client >>>> >>>>> >> >> > >> > > > > > > side. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Instead, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> the >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> user jars are built in >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When >>>> the job manager >>>> >>>>> >> >> > >> launched, >>>> >>>>> >> >> > >> > > we >>>> >>>>> >> >> > >> > > > > > just >>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> generate the job graph >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> based on local user jars. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> I have some small suggestion >>>> about this. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 1. >>>> `ProgramJobGraphRetriever` is very similar to >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> `ClasspathJobGraphRetriever`, the differences >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> are the former needs >>>> `ProgramMetadata` and the >>>> >>>>> >> >> > >> latter >>>> >>>>> >> >> > >> > > > > needs >>>> >>>>> >> >> > >> > > > > > > >> >>> some >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> arguments. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Is it possible to >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> have an unified >>>> `JobGraphRetriever` to support >>>> >>>>> >> >> > >> both? >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not use >>>> a local user jar to >>>> >>>>> >> >> > >> > > start >>>> >>>>> >> >> > >> > > > a >>>> >>>>> >> >> > >> > > > > > > >> >>> per-job >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> cluster? >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> In your case, the user jars >>>> has >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already and >>>> we do need to >>>> >>>>> >> >> > >> download >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > jars >>>> >>>>> >> >> > >> > > > > > > to >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> deployer >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> always need a local user jar >>>> to start a flink >>>> >>>>> >> >> > >> > > cluster. >>>> >>>>> >> >> > >> > > > It >>>> >>>>> >> >> > >> > > > > > is >>>> >>>>> >> >> > >> > > > > > > >> >>> be >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> great >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> if >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> we >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> could support remote user >>>> jars. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, we >>>> assume users package >>>> >>>>> >> >> > >> > > > > > > >> >>> flink-clients, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, flink-table >>>> together within >>>> >>>>> >> >> > >> the job >>>> >>>>> >> >> > >> > > > jar. >>>> >>>>> >> >> > >> > > > > > > >> >>>>> Otherwise, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> job graph generation within >>>> >>>>> >> >> > >> JobClusterEntryPoint will >>>> >>>>> >> >> > >> > > > > fail. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>> the package? Do users >>>> >>>>> >> >> > >> need >>>> >>>>> >> >> > >> > > to >>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients, >>>> flink-optimizer, >>>> >>>>> >> >> > >> flink-table >>>> >>>>> >> >> > >> > > > > > codes? >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Best, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Yang >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang < >>>> huangzhenqiu0...@gmail.com> >>>> >>>>> >> >> > >> > > > 于2019年12月10日周二 >>>> >>>>> >> >> > >> > > > > > > >> >>>>> 上午2:37写道: >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Dear All, >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink >>>> community starts to >>>> >>>>> >> >> > >> improve the >>>> >>>>> >> >> > >> > > > yarn >>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> descriptor >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and config >>>> files configurable >>>> >>>>> >> >> > >> from >>>> >>>>> >> >> > >> > > > CLI. >>>> >>>>> >> >> > >> > > > > It >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> improves >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> flexibility of Flink >>>> deployment Yarn Per Job >>>> >>>>> >> >> > >> Mode. >>>> >>>>> >> >> > >> > > > For >>>> >>>>> >> >> > >> > > > > > > >> >>>>> platform >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> users >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> who >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds of >>>> streaming pipelines >>>> >>>>> >> >> > >> for >>>> >>>>> >> >> > >> > > the >>>> >>>>> >> >> > >> > > > > > whole >>>> >>>>> >> >> > >> > > > > > > >> >>>>> org >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> or >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job >>>> graph generation in >>>> >>>>> >> >> > >> > > > > client-side >>>> >>>>> >> >> > >> > > > > > is >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> another >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want to >>>> propose a >>>> >>>>> >> >> > >> configurable >>>> >>>>> >> >> > >> > > > > feature >>>> >>>>> >> >> > >> > > > > > > >> >>> for >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The >>>> feature can allow >>>> >>>>> >> >> > >> users to >>>> >>>>> >> >> > >> > > > > choose >>>> >>>>> >> >> > >> > > > > > > >> >>> the >>>> >>>>> >> >> > >> > > > > > > >> >>>>> job >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> graph >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink >>>> ClusterEntryPoint so that >>>> >>>>> >> >> > >> the >>>> >>>>> >> >> > >> > > job >>>> >>>>> >> >> > >> > > > > jar >>>> >>>>> >> >> > >> > > > > > > >> >>>>> doesn't >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> need >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> to >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job >>>> graph generation. The >>>> >>>>> >> >> > >> > > proposal >>>> >>>>> >> >> > >> > > > is >>>> >>>>> >> >> > >> > > > > > > >> >>>>> organized >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> as a >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FLIP >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> . >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and >>>> suggestions are welcomed. >>>> >>>>> >> >> > >> Thank >>>> >>>>> >> >> > >> > > you >>>> >>>>> >> >> > >> > > > in >>>> >>>>> >> >> > >> > > > > > > >> >>>>> advance. >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>> >>>>> >> >> > >> > > > > > > >> >>> >>>> >>>>> >> >> > >> > > > > > > >> >> >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >> >> > >> > > > > > > >>>> >>>>> >> >> > >> > > > > > >>>> >>>>> >> >> > >> > > > > >>>> >>>>> >> >> > >> > > > >>>> >>>>> >> >> > >> > > >>>> >>>>> >> >> > >> >>>> >>>>> >> >> > > >>>> >>>>> >> >> >>>> >>>