Hi all, And thanks for the discussion topics.
For the cluster lifecycle, it is the Entrypoint that will tear down the cluster when the application finishes. Probably we should emphasise it a bit more in the FLIP. For the -R flag, this was in the PoC that I published just as a quick implementation, so that I can move fast to the entrypoint part. Personally, I would not even be against having a separate command in the CLI for this, sth like run-on-cluster or something along those lines. What do you think? For fetching jars, in the FLIP we say that as a first implementation we can have Local and DFS. I was wondering if in the case of YARN, both could be somehow implemented using LocalResources, and let Yarn do the actual fetch. But I have not investigated it further. Do you have any opinion on this? Cheers, Kostas On Mon, Mar 9, 2020 at 10:47 AM Becket Qin <becket....@gmail.com> wrote: > > Thanks Yang, > > That would be very helpful! > > Jiangjie (Becket) Qin > > On Mon, Mar 9, 2020 at 3:31 PM Yang Wang <danrtsey...@gmail.com> wrote: >> >> Hi Becket, >> >> Thanks for your suggestion. We will update the FLIP to add/enrich the >> following parts. >> * User cli option change, use "-R/--remote" to apply the cluster deploy mode >> * Configuration change, how to specify remote user jars and dependencies >> * The whole story about how "application mode" works, upload -> fetch -> >> submit job >> * The cluster lifecycle, when and how the Flink cluster is destroyed >> >> >> Best, >> Yang >> >> Becket Qin <becket....@gmail.com> 于2020年3月9日周一 下午12:34写道: >>> >>> Thanks for the reply, tison and Yang, >>> >>> Regarding the public interface, is "-R/--remote" option the only change? >>> Will the users also need to provide a remote location to upload and store >>> the jars, and a list of jars as dependencies to be uploaded? >>> >>> It would be important that the public interface section in the FLIP >>> includes all the user sensible changes including the CLI / configuration / >>> metrics, etc. Can we update the FLIP to include the conclusion we have here >>> in the ML? >>> >>> Thanks, >>> >>> Jiangjie (Becket) Qin >>> >>> On Mon, Mar 9, 2020 at 11:59 AM Yang Wang <danrtsey...@gmail.com> wrote: >>>> >>>> Hi Becket, >>>> >>>> Thanks for jumping out and sharing your concerns. I second tison's answer >>>> and just >>>> make some additions. >>>> >>>> >>>> > job submission interface >>>> >>>> This FLIP will introduce an interface for running user `main()` on >>>> cluster, named as >>>> “ProgramDeployer”. However, it is not a public interface. It will be used >>>> in `CliFrontend` >>>> when the remote deploy option(-R/--remote-deploy) is specified. So the >>>> only changes >>>> on user side is about the cli option. >>>> >>>> >>>> > How to fetch the jars? >>>> >>>> The “local path” and “dfs path“ could be supported to fetch the user jars >>>> and dependencies. >>>> Just like tison has said, we could ship the user jar and dependencies from >>>> client side to >>>> HDFS and use the entrypoint to fetch. >>>> >>>> Also we have some other practical ways to use the new “application mode“. >>>> 1. Upload the user jars and dependencies to the DFS(e.g. HDFS, S3, Aliyun >>>> OSS) manually >>>> or some external deployer system. For K8s, the user jars and dependencies >>>> could also be >>>> built in the docker image. >>>> 2. Specify the remote/local user jar and dependencies in `flink run`. >>>> Usually this could also >>>> be done by the external deployer system. >>>> 3. When the `ClusterEntrypoint` is launched, it will fetch the jars and >>>> files automatically. We >>>> do not need any specific fetcher implementation. Since we could leverage >>>> flink `FileSystem` >>>> to do this. >>>> >>>> >>>> >>>> >>>> Best, >>>> Yang >>>> >>>> tison <wander4...@gmail.com> 于2020年3月9日周一 上午11:34写道: >>>>> >>>>> Hi Becket, >>>>> >>>>> Thanks for your attention on FLIP-85! I answered your question inline. >>>>> >>>>> 1. What exactly the job submission interface will look like after this >>>>> FLIP? The FLIP template has a Public Interface section but was removed >>>>> from this FLIP. >>>>> >>>>> As Yang mentioned in this thread above: >>>>> >>>>> From user perspective, only a `-R/-- remote-deploy` cli option is >>>>> visible. They are not aware of the application mode. >>>>> >>>>> 2. How will the new ClusterEntrypoint fetch the jars from external >>>>> storage? What external storage will be supported out of the box? Will >>>>> this "jar fetcher" be pluggable? If so, how does the API look like and >>>>> how will users specify the custom "jar fetcher"? >>>>> >>>>> It depends actually. Here are several points: >>>>> >>>>> i. Currently, shipping user files is handled by Flink, dependencies >>>>> fetching can be handled by Flink. >>>>> ii. Current, we only support local file system shipfiles. When in >>>>> Application Mode, to support meaningful jar fetch we should support user >>>>> to configure richer shipfiles schema at first. >>>>> iii. Dependencies fetching varies from deployments. That is, on YARN, its >>>>> convention is through HDFS; on Kubernetes, its convention is configured >>>>> resource server and fetched by initContainer. >>>>> >>>>> Thus, in the First phase of Application Mode dependencies fetching is >>>>> totally handled within Flink. >>>>> >>>>> 3. It sounds that in this FLIP, the "session cluster" running the >>>>> application has the same lifecycle as the user application. How will the >>>>> session cluster be teared down after the application finishes? Will the >>>>> ClusterEntrypoint do that? Will there be an option of not tearing the >>>>> cluster down? >>>>> >>>>> The precondition we tear down the cluster is *both* >>>>> >>>>> i. user main reached to its end >>>>> ii. all jobs submitted(current, at most one) reached global terminate >>>>> state >>>>> >>>>> For the "how", it is an implementation topic, but conceptually it is >>>>> ClusterEntrypoint's responsibility. >>>>> >>>>> >Will there be an option of not tearing the cluster down? >>>>> >>>>> I think the answer is "No" because the cluster is designed to be bounded >>>>> with an Application. User logic that communicates with the job is always >>>>> in its `main`, and for history information we have history server. >>>>> >>>>> Best, >>>>> tison. >>>>> >>>>> >>>>> Becket Qin <becket....@gmail.com> 于2020年3月9日周一 上午8:12写道: >>>>>> >>>>>> Hi Peter and Kostas, >>>>>> >>>>>> Thanks for creating this FLIP. Moving the JobGraph compilation to the >>>>>> cluster makes a lot of sense to me. FLIP-40 had the exactly same idea, >>>>>> but is currently dormant and can probably be superseded by this FLIP. >>>>>> After reading the FLIP, I still have a few questions. >>>>>> >>>>>> 1. What exactly the job submission interface will look like after this >>>>>> FLIP? The FLIP template has a Public Interface section but was removed >>>>>> from this FLIP. >>>>>> 2. How will the new ClusterEntrypoint fetch the jars from external >>>>>> storage? What external storage will be supported out of the box? Will >>>>>> this "jar fetcher" be pluggable? If so, how does the API look like and >>>>>> how will users specify the custom "jar fetcher"? >>>>>> 3. It sounds that in this FLIP, the "session cluster" running the >>>>>> application has the same lifecycle as the user application. How will the >>>>>> session cluster be teared down after the application finishes? Will the >>>>>> ClusterEntrypoint do that? Will there be an option of not tearing the >>>>>> cluster down? >>>>>> >>>>>> Maybe they have been discussed in the ML earlier, but I think they >>>>>> should be part of the FLIP also. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>> On Thu, Mar 5, 2020 at 10:09 PM Kostas Kloudas <kklou...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> Also from my side +1 to start voting. >>>>>>> >>>>>>> Cheers, >>>>>>> Kostas >>>>>>> >>>>>>> On Thu, Mar 5, 2020 at 7:45 AM tison <wander4...@gmail.com> wrote: >>>>>>> > >>>>>>> > +1 to star voting. >>>>>>> > >>>>>>> > Best, >>>>>>> > tison. >>>>>>> > >>>>>>> > >>>>>>> > Yang Wang <danrtsey...@gmail.com> 于2020年3月5日周四 下午2:29写道: >>>>>>> >> >>>>>>> >> Hi Peter, >>>>>>> >> Really thanks for your response. >>>>>>> >> >>>>>>> >> Hi all @Kostas Kloudas @Zili Chen @Peter Huang @Rong Rong >>>>>>> >> It seems that we have reached an agreement. The “application mode” >>>>>>> >> is regarded as the enhanced “per-job”. It is >>>>>>> >> orthogonal with “cluster deploy”. Currently, we bind the “per-job” >>>>>>> >> to `run-user-main-on-client` and “application mode” >>>>>>> >> to `run-user-main-on-cluster`. >>>>>>> >> >>>>>>> >> Do you have other concerns to moving FLIP-85 to voting? >>>>>>> >> >>>>>>> >> >>>>>>> >> Best, >>>>>>> >> Yang >>>>>>> >> >>>>>>> >> Peter Huang <huangzhenqiu0...@gmail.com> 于2020年3月5日周四 下午12:48写道: >>>>>>> >>> >>>>>>> >>> Hi Yang and Kostas, >>>>>>> >>> >>>>>>> >>> Thanks for the clarification. It makes more sense to me if the long >>>>>>> >>> term goal is to replace per job mode to application mode >>>>>>> >>> in the future (at the time that multiple execute can be >>>>>>> >>> supported). Before that, It will be better to keep the concept of >>>>>>> >>> application mode internally. As Yang suggested, User only need to >>>>>>> >>> use a `-R/-- remote-deploy` cli option to launch >>>>>>> >>> a per job cluster with the main function executed in cluster >>>>>>> >>> entry-point. +1 for the execution plan. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Best Regards >>>>>>> >>> Peter Huang >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On Tue, Mar 3, 2020 at 7:11 AM Yang Wang <danrtsey...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>>> >>>>>>> >>>> Hi Peter, >>>>>>> >>>> >>>>>>> >>>> Having the application mode does not mean we will drop the >>>>>>> >>>> cluster-deploy >>>>>>> >>>> option. I just want to share some thoughts about “Application >>>>>>> >>>> Mode”. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> 1. The application mode could cover the per-job sematic. Its >>>>>>> >>>> lifecyle is bound >>>>>>> >>>> to the user `main()`. And all the jobs in the user main will be >>>>>>> >>>> executed in a same >>>>>>> >>>> Flink cluster. In first phase of FLIP-85 implementation, running >>>>>>> >>>> user main on the >>>>>>> >>>> cluster side could be supported in application mode. >>>>>>> >>>> >>>>>>> >>>> 2. Maybe in the future, we also need to support multiple >>>>>>> >>>> `execute()` on client side >>>>>>> >>>> in a same Flink cluster. Then the per-job mode will evolve to >>>>>>> >>>> application mode. >>>>>>> >>>> >>>>>>> >>>> 3. From user perspective, only a `-R/-- remote-deploy` cli option >>>>>>> >>>> is visible. They >>>>>>> >>>> are not aware of the application mode. >>>>>>> >>>> >>>>>>> >>>> 4. In the first phase, the application mode is working as >>>>>>> >>>> “per-job”(only one job in >>>>>>> >>>> the user main). We just leave more potential for the future. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> I am not against with calling it “cluster deploy mode” if you all >>>>>>> >>>> think it is clearer for users. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> Best, >>>>>>> >>>> Yang >>>>>>> >>>> >>>>>>> >>>> Kostas Kloudas <kklou...@gmail.com> 于2020年3月3日周二 下午6:49写道: >>>>>>> >>>>> >>>>>>> >>>>> Hi Peter, >>>>>>> >>>>> >>>>>>> >>>>> I understand your point. This is why I was also a bit torn about >>>>>>> >>>>> the >>>>>>> >>>>> name and my proposal was a bit aligned with yours (something >>>>>>> >>>>> along the >>>>>>> >>>>> lines of "cluster deploy" mode). >>>>>>> >>>>> >>>>>>> >>>>> But many of the other participants in the discussion suggested the >>>>>>> >>>>> "Application Mode". I think that the reasoning is that now the >>>>>>> >>>>> user's >>>>>>> >>>>> Application is more self-contained. >>>>>>> >>>>> It will be submitted to the cluster and the user can just >>>>>>> >>>>> disconnect. >>>>>>> >>>>> In addition, as discussed briefly in the doc, in the future there >>>>>>> >>>>> may >>>>>>> >>>>> be better support for multi-execute applications which will bring >>>>>>> >>>>> us >>>>>>> >>>>> one step closer to the true "Application Mode". But this is how I >>>>>>> >>>>> interpreted their arguments, of course they can also express their >>>>>>> >>>>> thoughts on the topic :) >>>>>>> >>>>> >>>>>>> >>>>> Cheers, >>>>>>> >>>>> Kostas >>>>>>> >>>>> >>>>>>> >>>>> On Mon, Mar 2, 2020 at 6:15 PM Peter Huang >>>>>>> >>>>> <huangzhenqiu0...@gmail.com> wrote: >>>>>>> >>>>> > >>>>>>> >>>>> > Hi Kostas, >>>>>>> >>>>> > >>>>>>> >>>>> > Thanks for updating the wiki. We have aligned with the >>>>>>> >>>>> > implementations in the doc. But I feel it is still a little bit >>>>>>> >>>>> > confusing of the naming from a user's perspective. It is well >>>>>>> >>>>> > known that Flink support per job cluster and session cluster. >>>>>>> >>>>> > The concept is in the layer of how a job is managed within >>>>>>> >>>>> > Flink. The method introduced util now is a kind of mixing job >>>>>>> >>>>> > and session cluster to promising the implementation complexity. >>>>>>> >>>>> > We probably don't need to label it as Application Model as the >>>>>>> >>>>> > same layer of per job cluster and session cluster. >>>>>>> >>>>> > Conceptually, I think it is still a cluster mode implementation >>>>>>> >>>>> > for per job cluster. >>>>>>> >>>>> > >>>>>>> >>>>> > To minimize the confusion of users, I think it would be better >>>>>>> >>>>> > just an option of per job cluster for each type of cluster >>>>>>> >>>>> > manager. How do you think? >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > Best Regards >>>>>>> >>>>> > Peter Huang >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > >>>>>>> >>>>> > On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas >>>>>>> >>>>> > <kklou...@gmail.com> wrote: >>>>>>> >>>>> >> >>>>>>> >>>>> >> Hi Yang, >>>>>>> >>>>> >> >>>>>>> >>>>> >> The difference between per-job and application mode is that, >>>>>>> >>>>> >> as you >>>>>>> >>>>> >> described, in the per-job mode the main is executed on the >>>>>>> >>>>> >> client >>>>>>> >>>>> >> while in the application mode, the main is executed on the >>>>>>> >>>>> >> cluster. >>>>>>> >>>>> >> I do not think we have to offer "application mode" with >>>>>>> >>>>> >> running the >>>>>>> >>>>> >> main on the client side as this is exactly what the per-job >>>>>>> >>>>> >> mode does >>>>>>> >>>>> >> currently and, as you described also, it would be redundant. >>>>>>> >>>>> >> >>>>>>> >>>>> >> Sorry if this was not clear in the document. >>>>>>> >>>>> >> >>>>>>> >>>>> >> Cheers, >>>>>>> >>>>> >> Kostas >>>>>>> >>>>> >> >>>>>>> >>>>> >> On Mon, Mar 2, 2020 at 3:17 PM Yang Wang >>>>>>> >>>>> >> <danrtsey...@gmail.com> wrote: >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > Hi Kostas, >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > Thanks a lot for your conclusion and updating the FLIP-85 >>>>>>> >>>>> >> > WIKI. Currently, i have no more >>>>>>> >>>>> >> > questions about motivation, approach, fault tolerance and >>>>>>> >>>>> >> > the first phase implementation. >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > I think the new title "Flink Application Mode" makes a lot >>>>>>> >>>>> >> > senses to me. Especially for the >>>>>>> >>>>> >> > containerized environment, the cluster deploy option will be >>>>>>> >>>>> >> > very useful. >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > Just one concern, how do we introduce this new application >>>>>>> >>>>> >> > mode to our users? >>>>>>> >>>>> >> > Each user program(i.e. `main()`) is an application. >>>>>>> >>>>> >> > Currently, we intend to only support one >>>>>>> >>>>> >> > `execute()`. So what's the difference between per-job and >>>>>>> >>>>> >> > application mode? >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > For per-job, user `main()` is always executed on client >>>>>>> >>>>> >> > side. And For application mode, user >>>>>>> >>>>> >> > `main()` could be executed on client or master >>>>>>> >>>>> >> > side(configured via cli option). >>>>>>> >>>>> >> > Right? We need to have a clear concept. Otherwise, the users >>>>>>> >>>>> >> > will be more and more confusing. >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > Best, >>>>>>> >>>>> >> > Yang >>>>>>> >>>>> >> > >>>>>>> >>>>> >> > Kostas Kloudas <kklou...@gmail.com> 于2020年3月2日周一 下午5:58写道: >>>>>>> >>>>> >> >> >>>>>>> >>>>> >> >> Hi all, >>>>>>> >>>>> >> >> >>>>>>> >>>>> >> >> I update >>>>>>> >>>>> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode >>>>>>> >>>>> >> >> based on the discussion we had here: >>>>>>> >>>>> >> >> >>>>>>> >>>>> >> >> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit# >>>>>>> >>>>> >> >> >>>>>>> >>>>> >> >> Please let me know what you think and please keep the >>>>>>> >>>>> >> >> discussion in the ML :) >>>>>>> >>>>> >> >> >>>>>>> >>>>> >> >> Thanks for starting the discussion and I hope that soon we >>>>>>> >>>>> >> >> will be >>>>>>> >>>>> >> >> able to vote on the FLIP. >>>>>>> >>>>> >> >> >>>>>>> >>>>> >> >> Cheers, >>>>>>> >>>>> >> >> Kostas >>>>>>> >>>>> >> >> >>>>>>> >>>>> >> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang >>>>>>> >>>>> >> >> <danrtsey...@gmail.com> wrote: >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > Hi all, >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > Thanks a lot for the feedback from @Kostas Kloudas. Your >>>>>>> >>>>> >> >> > all concerns are >>>>>>> >>>>> >> >> > on point. The FLIP-85 is mainly >>>>>>> >>>>> >> >> > focused on supporting cluster mode for per-job. Since it >>>>>>> >>>>> >> >> > is more urgent and >>>>>>> >>>>> >> >> > have much more use >>>>>>> >>>>> >> >> > cases both in Yarn and Kubernetes deployment. For session >>>>>>> >>>>> >> >> > cluster, we could >>>>>>> >>>>> >> >> > have more discussion >>>>>>> >>>>> >> >> > in a new thread later. >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > #1, How to download the user jars and dependencies for >>>>>>> >>>>> >> >> > per-job in cluster >>>>>>> >>>>> >> >> > mode? >>>>>>> >>>>> >> >> > For Yarn, we could register the user jars and >>>>>>> >>>>> >> >> > dependencies as >>>>>>> >>>>> >> >> > LocalResource. They will be distributed >>>>>>> >>>>> >> >> > by Yarn. And once the JobManager and TaskManager >>>>>>> >>>>> >> >> > launched, the jars are >>>>>>> >>>>> >> >> > already exists. >>>>>>> >>>>> >> >> > For Standalone per-job and K8s, we expect that the user >>>>>>> >>>>> >> >> > jars >>>>>>> >>>>> >> >> > and dependencies are built into the image. >>>>>>> >>>>> >> >> > Or the InitContainer could be used for downloading. It is >>>>>>> >>>>> >> >> > natively >>>>>>> >>>>> >> >> > distributed and we will not have bottleneck. >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > #2, Job graph recovery >>>>>>> >>>>> >> >> > We could have an optimization to store job graph on the >>>>>>> >>>>> >> >> > DFS. However, i >>>>>>> >>>>> >> >> > suggest building a new jobgraph >>>>>>> >>>>> >> >> > from the configuration is the default option. Since we >>>>>>> >>>>> >> >> > will not always have >>>>>>> >>>>> >> >> > a DFS store when deploying a >>>>>>> >>>>> >> >> > Flink per-job cluster. Of course, we assume that using >>>>>>> >>>>> >> >> > the same >>>>>>> >>>>> >> >> > configuration(e.g. job_id, user_jar, main_class, >>>>>>> >>>>> >> >> > main_args, parallelism, savepoint_settings, etc.) will >>>>>>> >>>>> >> >> > get a same job >>>>>>> >>>>> >> >> > graph. I think the standalone per-job >>>>>>> >>>>> >> >> > already has the similar behavior. >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > #3, What happens with jobs that have multiple execute >>>>>>> >>>>> >> >> > calls? >>>>>>> >>>>> >> >> > Currently, it is really a problem. Even we use a local >>>>>>> >>>>> >> >> > client on Flink >>>>>>> >>>>> >> >> > master side, it will have different behavior with >>>>>>> >>>>> >> >> > client mode. For client mode, if we execute multiple >>>>>>> >>>>> >> >> > times, then we will >>>>>>> >>>>> >> >> > deploy multiple Flink clusters for each execute. >>>>>>> >>>>> >> >> > I am not pretty sure whether it is reasonable. However, i >>>>>>> >>>>> >> >> > still think using >>>>>>> >>>>> >> >> > the local client is a good choice. We could >>>>>>> >>>>> >> >> > continue the discussion in a new thread. @Zili Chen >>>>>>> >>>>> >> >> > <wander4...@gmail.com> Do >>>>>>> >>>>> >> >> > you want to drive this? >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > Best, >>>>>>> >>>>> >> >> > Yang >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月16日周四 >>>>>>> >>>>> >> >> > 上午1:55写道: >>>>>>> >>>>> >> >> > >>>>>>> >>>>> >> >> > > Hi Kostas, >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > Thanks for this feedback. I can't agree more about the >>>>>>> >>>>> >> >> > > opinion. The >>>>>>> >>>>> >> >> > > cluster mode should be added >>>>>>> >>>>> >> >> > > first in per job cluster. >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > 1) For job cluster implementation >>>>>>> >>>>> >> >> > > 1. Job graph recovery from configuration or store as >>>>>>> >>>>> >> >> > > static job graph as >>>>>>> >>>>> >> >> > > session cluster. I think the static one will be better >>>>>>> >>>>> >> >> > > for less recovery >>>>>>> >>>>> >> >> > > time. >>>>>>> >>>>> >> >> > > Let me update the doc for details. >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > 2. For job execute multiple times, I think @Zili Chen >>>>>>> >>>>> >> >> > > <wander4...@gmail.com> has proposed the local client >>>>>>> >>>>> >> >> > > solution that can >>>>>>> >>>>> >> >> > > the run program actually in the cluster entry point. We >>>>>>> >>>>> >> >> > > can put the >>>>>>> >>>>> >> >> > > implementation in the second stage, >>>>>>> >>>>> >> >> > > or even a new FLIP for further discussion. >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > 2) For session cluster implementation >>>>>>> >>>>> >> >> > > We can disable the cluster mode for the session cluster >>>>>>> >>>>> >> >> > > in the first >>>>>>> >>>>> >> >> > > stage. I agree the jar downloading will be a painful >>>>>>> >>>>> >> >> > > thing. >>>>>>> >>>>> >> >> > > We can consider about PoC and performance evaluation >>>>>>> >>>>> >> >> > > first. If the end to >>>>>>> >>>>> >> >> > > end experience is good enough, then we can consider >>>>>>> >>>>> >> >> > > proceeding with the solution. >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > Looking forward to more opinions from @Yang Wang >>>>>>> >>>>> >> >> > > <danrtsey...@gmail.com> @Zili >>>>>>> >>>>> >> >> > > Chen <wander4...@gmail.com> @Dian Fu >>>>>>> >>>>> >> >> > > <dian0511...@gmail.com>. >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > Best Regards >>>>>>> >>>>> >> >> > > Peter Huang >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas >>>>>>> >>>>> >> >> > > <kklou...@gmail.com> wrote: >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >> > >> Hi all, >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > >> I am writing here as the discussion on the Google Doc >>>>>>> >>>>> >> >> > >> seems to be a >>>>>>> >>>>> >> >> > >> bit difficult to follow. >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > >> I think that in order to be able to make progress, it >>>>>>> >>>>> >> >> > >> would be helpful >>>>>>> >>>>> >> >> > >> to focus on per-job mode for now. >>>>>>> >>>>> >> >> > >> The reason is that: >>>>>>> >>>>> >> >> > >> 1) making the (unique) JobSubmitHandler responsible >>>>>>> >>>>> >> >> > >> for creating the >>>>>>> >>>>> >> >> > >> jobgraphs, >>>>>>> >>>>> >> >> > >> which includes downloading dependencies, is not an >>>>>>> >>>>> >> >> > >> optimal solution >>>>>>> >>>>> >> >> > >> 2) even if we put the responsibility on the >>>>>>> >>>>> >> >> > >> JobMaster, currently each >>>>>>> >>>>> >> >> > >> job has its own >>>>>>> >>>>> >> >> > >> JobMaster but they all run on the same process, so >>>>>>> >>>>> >> >> > >> we have again a >>>>>>> >>>>> >> >> > >> single entity. >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > >> Of course after this is done, and if we feel >>>>>>> >>>>> >> >> > >> comfortable with the >>>>>>> >>>>> >> >> > >> solution, then we can go to the session mode. >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > >> A second comment has to do with fault-tolerance in the >>>>>>> >>>>> >> >> > >> per-job, >>>>>>> >>>>> >> >> > >> cluster-deploy mode. >>>>>>> >>>>> >> >> > >> In the document, it is suggested that upon recovery, >>>>>>> >>>>> >> >> > >> the JobMaster of >>>>>>> >>>>> >> >> > >> each job re-creates the JobGraph. >>>>>>> >>>>> >> >> > >> I am just wondering if it is better to create and >>>>>>> >>>>> >> >> > >> store the jobGraph >>>>>>> >>>>> >> >> > >> upon submission and only fetch it >>>>>>> >>>>> >> >> > >> upon recovery so that we have a static jobGraph. >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > >> Finally, I have a question which is what happens with >>>>>>> >>>>> >> >> > >> jobs that have >>>>>>> >>>>> >> >> > >> multiple execute calls? >>>>>>> >>>>> >> >> > >> The semantics seem to change compared to the current >>>>>>> >>>>> >> >> > >> behaviour, right? >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > >> Cheers, >>>>>>> >>>>> >> >> > >> Kostas >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison >>>>>>> >>>>> >> >> > >> <wander4...@gmail.com> wrote: >>>>>>> >>>>> >> >> > >> > >>>>>>> >>>>> >> >> > >> > not always, Yang Wang is also not yet a committer >>>>>>> >>>>> >> >> > >> > but he can join the >>>>>>> >>>>> >> >> > >> > channel. I cannot find the id by clicking “Add new >>>>>>> >>>>> >> >> > >> > member in channel” so >>>>>>> >>>>> >> >> > >> > come to you and ask for try out the link. Possibly I >>>>>>> >>>>> >> >> > >> > will find other >>>>>>> >>>>> >> >> > >> ways >>>>>>> >>>>> >> >> > >> > but the original purpose is that the slack channel >>>>>>> >>>>> >> >> > >> > is a public area we >>>>>>> >>>>> >> >> > >> > discuss about developing... >>>>>>> >>>>> >> >> > >> > Best, >>>>>>> >>>>> >> >> > >> > tison. >>>>>>> >>>>> >> >> > >> > >>>>>>> >>>>> >> >> > >> > >>>>>>> >>>>> >> >> > >> > Peter Huang <huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> > 于2020年1月9日周四 上午2:44写道: >>>>>>> >>>>> >> >> > >> > >>>>>>> >>>>> >> >> > >> > > Hi Tison, >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> > > I am not the committer of Flink yet. I think I >>>>>>> >>>>> >> >> > >> > > can't join it also. >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> > > Best Regards >>>>>>> >>>>> >> >> > >> > > Peter Huang >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison >>>>>>> >>>>> >> >> > >> > > <wander4...@gmail.com> wrote: >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> > > > Hi Peter, >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > > Could you try out this link? >>>>>>> >>>>> >> >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > > Best, >>>>>>> >>>>> >> >> > >> > > > tison. >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > 于2020年1月9日周四 上午1:22写道: >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > > > Hi Tison, >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > > I can't join the group with shared link. Would >>>>>>> >>>>> >> >> > >> > > > > you please add me >>>>>>> >>>>> >> >> > >> into >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > group? My slack account is huangzhenqiu0825. >>>>>>> >>>>> >> >> > >> > > > > Thank you in advance. >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > > Best Regards >>>>>>> >>>>> >> >> > >> > > > > Peter Huang >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison >>>>>>> >>>>> >> >> > >> > > > > <wander4...@gmail.com> >>>>>>> >>>>> >> >> > >> wrote: >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > > > Hi Peter, >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > As described above, this effort should get >>>>>>> >>>>> >> >> > >> > > > > > attention from people >>>>>>> >>>>> >> >> > >> > > > > developing >>>>>>> >>>>> >> >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I >>>>>>> >>>>> >> >> > >> > > > > > recommend you to join >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > > public >>>>>>> >>>>> >> >> > >> > > > > > slack channel[1] for Flink Client API >>>>>>> >>>>> >> >> > >> > > > > > Enhancement and you can >>>>>>> >>>>> >> >> > >> try to >>>>>>> >>>>> >> >> > >> > > > > share >>>>>>> >>>>> >> >> > >> > > > > > you detailed thoughts there. It possibly >>>>>>> >>>>> >> >> > >> > > > > > gets more concrete >>>>>>> >>>>> >> >> > >> > > attentions. >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > Best, >>>>>>> >>>>> >> >> > >> > > > > > tison. >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > [1] >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > 于2020年1月7日周二 上午5:09写道: >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > Dear All, >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > Happy new year! According to existing >>>>>>> >>>>> >> >> > >> > > > > > > feedback from the >>>>>>> >>>>> >> >> > >> community, >>>>>>> >>>>> >> >> > >> > > we >>>>>>> >>>>> >> >> > >> > > > > > > revised the doc with the consideration of >>>>>>> >>>>> >> >> > >> > > > > > > session cluster >>>>>>> >>>>> >> >> > >> support, >>>>>>> >>>>> >> >> > >> > > > and >>>>>>> >>>>> >> >> > >> > > > > > > concrete interface changes needed and >>>>>>> >>>>> >> >> > >> > > > > > > execution plan. Please >>>>>>> >>>>> >> >> > >> take >>>>>>> >>>>> >> >> > >> > > one >>>>>>> >>>>> >> >> > >> > > > > > more >>>>>>> >>>>> >> >> > >> > > > > > > round of review at your most convenient >>>>>>> >>>>> >> >> > >> > > > > > > time. >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit# >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > Best Regards >>>>>>> >>>>> >> >> > >> > > > > > > Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter >>>>>>> >>>>> >> >> > >> > > > > > > Huang < >>>>>>> >>>>> >> >> > >> > > > > huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > wrote: >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > Hi Dian, >>>>>>> >>>>> >> >> > >> > > > > > > > Thanks for giving us valuable feedbacks. >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > 1) It's better to have a whole design >>>>>>> >>>>> >> >> > >> > > > > > > > for this feature >>>>>>> >>>>> >> >> > >> > > > > > > > For the suggestion of enabling the >>>>>>> >>>>> >> >> > >> > > > > > > > cluster mode also session >>>>>>> >>>>> >> >> > >> > > > > cluster, I >>>>>>> >>>>> >> >> > >> > > > > > > > think Flink already supported it. >>>>>>> >>>>> >> >> > >> > > > > > > > WebSubmissionExtension >>>>>>> >>>>> >> >> > >> already >>>>>>> >>>>> >> >> > >> > > > > allows >>>>>>> >>>>> >> >> > >> > > > > > > > users to start a job with the specified >>>>>>> >>>>> >> >> > >> > > > > > > > jar by using web UI. >>>>>>> >>>>> >> >> > >> > > > > > > > But we need to enable the feature from >>>>>>> >>>>> >> >> > >> > > > > > > > CLI for both local >>>>>>> >>>>> >> >> > >> jar, >>>>>>> >>>>> >> >> > >> > > > remote >>>>>>> >>>>> >> >> > >> > > > > > > jar. >>>>>>> >>>>> >> >> > >> > > > > > > > I will align with Yang Wang first about >>>>>>> >>>>> >> >> > >> > > > > > > > the details and >>>>>>> >>>>> >> >> > >> update >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > > design >>>>>>> >>>>> >> >> > >> > > > > > > > doc. >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > 2) It's better to consider the >>>>>>> >>>>> >> >> > >> > > > > > > > convenience for users, such >>>>>>> >>>>> >> >> > >> as >>>>>>> >>>>> >> >> > >> > > > > debugging >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > I am wondering whether we can store the >>>>>>> >>>>> >> >> > >> > > > > > > > exception in >>>>>>> >>>>> >> >> > >> jobgragh >>>>>>> >>>>> >> >> > >> > > > > > > > generation in application master. As no >>>>>>> >>>>> >> >> > >> > > > > > > > streaming graph can >>>>>>> >>>>> >> >> > >> be >>>>>>> >>>>> >> >> > >> > > > > > scheduled >>>>>>> >>>>> >> >> > >> > > > > > > in >>>>>>> >>>>> >> >> > >> > > > > > > > this case, there will be no more TM will >>>>>>> >>>>> >> >> > >> > > > > > > > be requested from >>>>>>> >>>>> >> >> > >> > > FlinkRM. >>>>>>> >>>>> >> >> > >> > > > > > > > If the AM is still running, users can >>>>>>> >>>>> >> >> > >> > > > > > > > still query it from >>>>>>> >>>>> >> >> > >> CLI. As >>>>>>> >>>>> >> >> > >> > > > it >>>>>>> >>>>> >> >> > >> > > > > > > > requires more change, we can get some >>>>>>> >>>>> >> >> > >> > > > > > > > feedback from < >>>>>>> >>>>> >> >> > >> > > > > > aljos...@apache.org >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > and @zjf...@gmail.com <zjf...@gmail.com>. >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > 3) It's better to consider the impact to >>>>>>> >>>>> >> >> > >> > > > > > > > the stability of >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > > cluster >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > I agree with Yang Wang's opinion. >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > Best Regards >>>>>>> >>>>> >> >> > >> > > > > > > > Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian Fu < >>>>>>> >>>>> >> >> > >> dian0511...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > wrote: >>>>>>> >>>>> >> >> > >> > > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >> Hi all, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> Sorry to jump into this discussion. >>>>>>> >>>>> >> >> > >> > > > > > > >> Thanks everyone for the >>>>>>> >>>>> >> >> > >> > > > > > discussion. >>>>>>> >>>>> >> >> > >> > > > > > > >> I'm very interested in this topic >>>>>>> >>>>> >> >> > >> > > > > > > >> although I'm not an >>>>>>> >>>>> >> >> > >> expert in >>>>>>> >>>>> >> >> > >> > > > this >>>>>>> >>>>> >> >> > >> > > > > > > part. >>>>>>> >>>>> >> >> > >> > > > > > > >> So I'm glad to share my thoughts as >>>>>>> >>>>> >> >> > >> > > > > > > >> following: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> 1) It's better to have a whole design >>>>>>> >>>>> >> >> > >> > > > > > > >> for this feature >>>>>>> >>>>> >> >> > >> > > > > > > >> As we know, there are two deployment >>>>>>> >>>>> >> >> > >> > > > > > > >> modes: per-job mode >>>>>>> >>>>> >> >> > >> and >>>>>>> >>>>> >> >> > >> > > > session >>>>>>> >>>>> >> >> > >> > > > > > > >> mode. I'm wondering which mode really >>>>>>> >>>>> >> >> > >> > > > > > > >> needs this feature. >>>>>>> >>>>> >> >> > >> As the >>>>>>> >>>>> >> >> > >> > > > > > design >>>>>>> >>>>> >> >> > >> > > > > > > doc >>>>>>> >>>>> >> >> > >> > > > > > > >> mentioned, per-job mode is more used >>>>>>> >>>>> >> >> > >> > > > > > > >> for streaming jobs and >>>>>>> >>>>> >> >> > >> > > > session >>>>>>> >>>>> >> >> > >> > > > > > > mode is >>>>>>> >>>>> >> >> > >> > > > > > > >> usually used for batch jobs(Of course, >>>>>>> >>>>> >> >> > >> > > > > > > >> the job types and >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > > > > deployment >>>>>>> >>>>> >> >> > >> > > > > > > >> modes are orthogonal). Usually >>>>>>> >>>>> >> >> > >> > > > > > > >> streaming job is only >>>>>>> >>>>> >> >> > >> needed to >>>>>>> >>>>> >> >> > >> > > be >>>>>>> >>>>> >> >> > >> > > > > > > submitted >>>>>>> >>>>> >> >> > >> > > > > > > >> once and it will run for days or weeks, >>>>>>> >>>>> >> >> > >> > > > > > > >> while batch jobs >>>>>>> >>>>> >> >> > >> will be >>>>>>> >>>>> >> >> > >> > > > > > > submitted >>>>>>> >>>>> >> >> > >> > > > > > > >> more frequently compared with streaming >>>>>>> >>>>> >> >> > >> > > > > > > >> jobs. This means >>>>>>> >>>>> >> >> > >> that >>>>>>> >>>>> >> >> > >> > > > maybe >>>>>>> >>>>> >> >> > >> > > > > > > session >>>>>>> >>>>> >> >> > >> > > > > > > >> mode also needs this feature. However, >>>>>>> >>>>> >> >> > >> > > > > > > >> if we support this >>>>>>> >>>>> >> >> > >> > > feature >>>>>>> >>>>> >> >> > >> > > > in >>>>>>> >>>>> >> >> > >> > > > > > > >> session mode, the application master >>>>>>> >>>>> >> >> > >> > > > > > > >> will become the new >>>>>>> >>>>> >> >> > >> > > > centralized >>>>>>> >>>>> >> >> > >> > > > > > > >> service(which should be solved). So in >>>>>>> >>>>> >> >> > >> > > > > > > >> this case, it's >>>>>>> >>>>> >> >> > >> better to >>>>>>> >>>>> >> >> > >> > > > > have >>>>>>> >>>>> >> >> > >> > > > > > a >>>>>>> >>>>> >> >> > >> > > > > > > >> complete design for both per-job mode >>>>>>> >>>>> >> >> > >> > > > > > > >> and session mode. >>>>>>> >>>>> >> >> > >> > > > Furthermore, >>>>>>> >>>>> >> >> > >> > > > > > > even >>>>>>> >>>>> >> >> > >> > > > > > > >> if we can do it phase by phase, we need >>>>>>> >>>>> >> >> > >> > > > > > > >> to have a whole >>>>>>> >>>>> >> >> > >> picture >>>>>>> >>>>> >> >> > >> > > of >>>>>>> >>>>> >> >> > >> > > > > how >>>>>>> >>>>> >> >> > >> > > > > > > it >>>>>>> >>>>> >> >> > >> > > > > > > >> works in both per-job mode and session >>>>>>> >>>>> >> >> > >> > > > > > > >> mode. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> 2) It's better to consider the >>>>>>> >>>>> >> >> > >> > > > > > > >> convenience for users, such >>>>>>> >>>>> >> >> > >> as >>>>>>> >>>>> >> >> > >> > > > > > debugging >>>>>>> >>>>> >> >> > >> > > > > > > >> After we finish this feature, the job >>>>>>> >>>>> >> >> > >> > > > > > > >> graph will be >>>>>>> >>>>> >> >> > >> compiled in >>>>>>> >>>>> >> >> > >> > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> application master, which means that >>>>>>> >>>>> >> >> > >> > > > > > > >> users cannot easily >>>>>>> >>>>> >> >> > >> get the >>>>>>> >>>>> >> >> > >> > > > > > > exception >>>>>>> >>>>> >> >> > >> > > > > > > >> message synchorousely in the job client >>>>>>> >>>>> >> >> > >> > > > > > > >> if there are >>>>>>> >>>>> >> >> > >> problems >>>>>>> >>>>> >> >> > >> > > > during >>>>>>> >>>>> >> >> > >> > > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> job graph compiling (especially for >>>>>>> >>>>> >> >> > >> > > > > > > >> platform users), such >>>>>>> >>>>> >> >> > >> as the >>>>>>> >>>>> >> >> > >> > > > > > > resource >>>>>>> >>>>> >> >> > >> > > > > > > >> path is incorrect, the user program >>>>>>> >>>>> >> >> > >> > > > > > > >> itself has some >>>>>>> >>>>> >> >> > >> problems, >>>>>>> >>>>> >> >> > >> > > etc. >>>>>>> >>>>> >> >> > >> > > > > > What >>>>>>> >>>>> >> >> > >> > > > > > > I'm >>>>>>> >>>>> >> >> > >> > > > > > > >> thinking is that maybe we should throw >>>>>>> >>>>> >> >> > >> > > > > > > >> the exceptions as >>>>>>> >>>>> >> >> > >> early >>>>>>> >>>>> >> >> > >> > > as >>>>>>> >>>>> >> >> > >> > > > > > > possible >>>>>>> >>>>> >> >> > >> > > > > > > >> (during job submission stage). >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> 3) It's better to consider the impact >>>>>>> >>>>> >> >> > >> > > > > > > >> to the stability of >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > > > cluster >>>>>>> >>>>> >> >> > >> > > > > > > >> If we perform the compiling in the >>>>>>> >>>>> >> >> > >> > > > > > > >> application master, we >>>>>>> >>>>> >> >> > >> should >>>>>>> >>>>> >> >> > >> > > > > > > consider >>>>>>> >>>>> >> >> > >> > > > > > > >> the impact of the compiling errors. >>>>>>> >>>>> >> >> > >> > > > > > > >> Although YARN could >>>>>>> >>>>> >> >> > >> resume >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> application master in case of failures, >>>>>>> >>>>> >> >> > >> > > > > > > >> but in some case >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > > > compiling >>>>>>> >>>>> >> >> > >> > > > > > > >> failure may be a waste of cluster >>>>>>> >>>>> >> >> > >> > > > > > > >> resource and may impact >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > > > > stability >>>>>>> >>>>> >> >> > >> > > > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> cluster and the other jobs in the >>>>>>> >>>>> >> >> > >> > > > > > > >> cluster, such as the >>>>>>> >>>>> >> >> > >> resource >>>>>>> >>>>> >> >> > >> > > > path >>>>>>> >>>>> >> >> > >> > > > > > is >>>>>>> >>>>> >> >> > >> > > > > > > >> incorrect, the user program itself has >>>>>>> >>>>> >> >> > >> > > > > > > >> some problems(in >>>>>>> >>>>> >> >> > >> this >>>>>>> >>>>> >> >> > >> > > case, >>>>>>> >>>>> >> >> > >> > > > > job >>>>>>> >>>>> >> >> > >> > > > > > > >> failover cannot solve this kind of >>>>>>> >>>>> >> >> > >> > > > > > > >> problems) etc. In the >>>>>>> >>>>> >> >> > >> current >>>>>>> >>>>> >> >> > >> > > > > > > >> implemention, the compiling errors are >>>>>>> >>>>> >> >> > >> > > > > > > >> handled in the >>>>>>> >>>>> >> >> > >> client >>>>>>> >>>>> >> >> > >> > > side >>>>>>> >>>>> >> >> > >> > > > > and >>>>>>> >>>>> >> >> > >> > > > > > > there >>>>>>> >>>>> >> >> > >> > > > > > > >> is no impact to the cluster at all. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed >>>>>>> >>>>> >> >> > >> > > > > > > >> in the design doc >>>>>>> >>>>> >> >> > >> that >>>>>>> >>>>> >> >> > >> > > only >>>>>>> >>>>> >> >> > >> > > > > > > per-job >>>>>>> >>>>> >> >> > >> > > > > > > >> mode will be supported. However, I >>>>>>> >>>>> >> >> > >> > > > > > > >> think it's better to >>>>>>> >>>>> >> >> > >> also >>>>>>> >>>>> >> >> > >> > > > > consider >>>>>>> >>>>> >> >> > >> > > > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> session mode in the design doc. >>>>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 2) and 3), I have not seen >>>>>>> >>>>> >> >> > >> > > > > > > >> related sections >>>>>>> >>>>> >> >> > >> in the >>>>>>> >>>>> >> >> > >> > > > > design >>>>>>> >>>>> >> >> > >> > > > > > > >> doc. It will be good if we can cover >>>>>>> >>>>> >> >> > >> > > > > > > >> them in the design >>>>>>> >>>>> >> >> > >> doc. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> Feel free to correct me If there is >>>>>>> >>>>> >> >> > >> > > > > > > >> anything I >>>>>>> >>>>> >> >> > >> misunderstand. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> Regards, >>>>>>> >>>>> >> >> > >> > > > > > > >> Dian >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> > 在 2019年12月27日,上午3:13,Peter Huang < >>>>>>> >>>>> >> >> > >> huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > 写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > Hi Yang, >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > I can't agree more. The effort >>>>>>> >>>>> >> >> > >> > > > > > > >> > definitely needs to align >>>>>>> >>>>> >> >> > >> with >>>>>>> >>>>> >> >> > >> > > > the >>>>>>> >>>>> >> >> > >> > > > > > > final >>>>>>> >>>>> >> >> > >> > > > > > > >> > goal of FLIP-73. >>>>>>> >>>>> >> >> > >> > > > > > > >> > I am thinking about whether we can >>>>>>> >>>>> >> >> > >> > > > > > > >> > achieve the goal with >>>>>>> >>>>> >> >> > >> two >>>>>>> >>>>> >> >> > >> > > > > phases. >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > 1) Phase I >>>>>>> >>>>> >> >> > >> > > > > > > >> > As the CLiFrontend will not be >>>>>>> >>>>> >> >> > >> > > > > > > >> > depreciated soon. We can >>>>>>> >>>>> >> >> > >> still >>>>>>> >>>>> >> >> > >> > > > use >>>>>>> >>>>> >> >> > >> > > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> > deployMode flag there, >>>>>>> >>>>> >> >> > >> > > > > > > >> > pass the program info through Flink >>>>>>> >>>>> >> >> > >> > > > > > > >> > configuration, use >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > > > > > >> > ClassPathJobGraphRetriever >>>>>>> >>>>> >> >> > >> > > > > > > >> > to generate the job graph in >>>>>>> >>>>> >> >> > >> > > > > > > >> > ClusterEntrypoints of yarn >>>>>>> >>>>> >> >> > >> and >>>>>>> >>>>> >> >> > >> > > > > > > Kubernetes. >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > 2) Phase II >>>>>>> >>>>> >> >> > >> > > > > > > >> > In AbstractJobClusterExecutor, the >>>>>>> >>>>> >> >> > >> > > > > > > >> > job graph is >>>>>>> >>>>> >> >> > >> generated in >>>>>>> >>>>> >> >> > >> > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> execute >>>>>>> >>>>> >> >> > >> > > > > > > >> > function. We can still >>>>>>> >>>>> >> >> > >> > > > > > > >> > use the deployMode in it. With >>>>>>> >>>>> >> >> > >> > > > > > > >> > deployMode = cluster, the >>>>>>> >>>>> >> >> > >> > > execute >>>>>>> >>>>> >> >> > >> > > > > > > >> function >>>>>>> >>>>> >> >> > >> > > > > > > >> > only starts the cluster. >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > When >>>>>>> >>>>> >> >> > >> > > > > > > >> > {Yarn/Kuberneates}PerJobClusterEntrypoint >>>>>>> >>>>> >> >> > >> > > > > > > >> > starts, >>>>>>> >>>>> >> >> > >> It will >>>>>>> >>>>> >> >> > >> > > > > start >>>>>>> >>>>> >> >> > >> > > > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> > dispatch first, then we can use >>>>>>> >>>>> >> >> > >> > > > > > > >> > a ClusterEnvironment similar to >>>>>>> >>>>> >> >> > >> > > > > > > >> > ContextEnvironment to >>>>>>> >>>>> >> >> > >> submit >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > job >>>>>>> >>>>> >> >> > >> > > > > > > >> with >>>>>>> >>>>> >> >> > >> > > > > > > >> > jobName the local >>>>>>> >>>>> >> >> > >> > > > > > > >> > dispatcher. For the details, we need >>>>>>> >>>>> >> >> > >> > > > > > > >> > more investigation. >>>>>>> >>>>> >> >> > >> Let's >>>>>>> >>>>> >> >> > >> > > > > wait >>>>>>> >>>>> >> >> > >> > > > > > > >> > for @Aljoscha >>>>>>> >>>>> >> >> > >> > > > > > > >> > Krettek <aljos...@apache.org> @Till >>>>>>> >>>>> >> >> > >> > > > > > > >> > Rohrmann < >>>>>>> >>>>> >> >> > >> > > > > trohrm...@apache.org >>>>>>> >>>>> >> >> > >> > > > > > >'s >>>>>>> >>>>> >> >> > >> > > > > > > >> > feedback after the holiday season. >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > Thank you in advance. Merry Chrismas >>>>>>> >>>>> >> >> > >> > > > > > > >> > and Happy New >>>>>>> >>>>> >> >> > >> Year!!! >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > Best Regards >>>>>>> >>>>> >> >> > >> > > > > > > >> > Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM Yang >>>>>>> >>>>> >> >> > >> > > > > > > >> > Wang < >>>>>>> >>>>> >> >> > >> > > > danrtsey...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > >> wrote: >>>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>>> >>>>> >> >> > >> > > > > > > >> >> Hi Peter, >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> I think we need to reconsider >>>>>>> >>>>> >> >> > >> > > > > > > >> >> tison's suggestion >>>>>>> >>>>> >> >> > >> seriously. >>>>>>> >>>>> >> >> > >> > > > After >>>>>>> >>>>> >> >> > >> > > > > > > >> FLIP-73, >>>>>>> >>>>> >> >> > >> > > > > > > >> >> the deployJobCluster has >>>>>>> >>>>> >> >> > >> > > > > > > >> >> beenmoved into >>>>>>> >>>>> >> >> > >> > > > > > > >> >> `JobClusterExecutor#execute`. It >>>>>>> >>>>> >> >> > >> > > > > > > >> >> should >>>>>>> >>>>> >> >> > >> not be >>>>>>> >>>>> >> >> > >> > > > > > > perceived >>>>>>> >>>>> >> >> > >> > > > > > > >> >> for `CliFrontend`. That >>>>>>> >>>>> >> >> > >> > > > > > > >> >> means the user program will *ALWAYS* >>>>>>> >>>>> >> >> > >> > > > > > > >> >> be executed on >>>>>>> >>>>> >> >> > >> client >>>>>>> >>>>> >> >> > >> > > > side. >>>>>>> >>>>> >> >> > >> > > > > > This >>>>>>> >>>>> >> >> > >> > > > > > > >> is >>>>>>> >>>>> >> >> > >> > > > > > > >> >> the by design behavior. >>>>>>> >>>>> >> >> > >> > > > > > > >> >> So, we could not just add `if(client >>>>>>> >>>>> >> >> > >> > > > > > > >> >> mode) .. else >>>>>>> >>>>> >> >> > >> if(cluster >>>>>>> >>>>> >> >> > >> > > > > mode) >>>>>>> >>>>> >> >> > >> > > > > > > >> ...` >>>>>>> >>>>> >> >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass >>>>>>> >>>>> >> >> > >> > > > > > > >> >> the executor. We need to find a >>>>>>> >>>>> >> >> > >> > > > > > > >> >> clean way to decouple >>>>>>> >>>>> >> >> > >> > > executing >>>>>>> >>>>> >> >> > >> > > > > > user >>>>>>> >>>>> >> >> > >> > > > > > > >> >> program and deploying per-job >>>>>>> >>>>> >> >> > >> > > > > > > >> >> cluster. Based on this, we could >>>>>>> >>>>> >> >> > >> > > > > > > >> >> support to execute user >>>>>>> >>>>> >> >> > >> > > > program >>>>>>> >>>>> >> >> > >> > > > > on >>>>>>> >>>>> >> >> > >> > > > > > > >> client >>>>>>> >>>>> >> >> > >> > > > > > > >> >> or master side. >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give >>>>>>> >>>>> >> >> > >> > > > > > > >> >> some good >>>>>>> >>>>> >> >> > >> suggestions. >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> Best, >>>>>>> >>>>> >> >> > >> > > > > > > >> >> Yang >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >> >> <huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> 于2019年12月25日周三 >>>>>>> >>>>> >> >> > >> > > > > 上午4:03写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> Hi Jingjing, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> The improvement proposed is a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> deployment option for >>>>>>> >>>>> >> >> > >> CLI. For >>>>>>> >>>>> >> >> > >> > > > SQL >>>>>>> >>>>> >> >> > >> > > > > > > based >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> Flink application, It is more >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> convenient to use the >>>>>>> >>>>> >> >> > >> existing >>>>>>> >>>>> >> >> > >> > > > > model >>>>>>> >>>>> >> >> > >> > > > > > > in >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> SqlClient in which >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> the job graph is generated within >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> SqlClient. After >>>>>>> >>>>> >> >> > >> adding >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > > > delayed >>>>>>> >>>>> >> >> > >> > > > > > > >> job >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> graph generation, I think there is >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> no change is needed >>>>>>> >>>>> >> >> > >> for >>>>>>> >>>>> >> >> > >> > > > your >>>>>>> >>>>> >> >> > >> > > > > > > side. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> Best Regards >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> jingjing bai < >>>>>>> >>>>> >> >> > >> > > > > > > >> baijingjing7...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> hi peter: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> we had extension SqlClent to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> support sql job >>>>>>> >>>>> >> >> > >> submit in >>>>>>> >>>>> >> >> > >> > > web >>>>>>> >>>>> >> >> > >> > > > > > base >>>>>>> >>>>> >> >> > >> > > > > > > on >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> flink 1.9. we support submit to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> yarn on per job >>>>>>> >>>>> >> >> > >> mode too. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> in this case, the job graph >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> generated on client >>>>>>> >>>>> >> >> > >> side >>>>>>> >>>>> >> >> > >> > > . I >>>>>>> >>>>> >> >> > >> > > > > > think >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> this >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> discuss Mainly to improve api >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> programme. but in my >>>>>>> >>>>> >> >> > >> case , >>>>>>> >>>>> >> >> > >> > > > > there >>>>>>> >>>>> >> >> > >> > > > > > is >>>>>>> >>>>> >> >> > >> > > > > > > >> no >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> jar to upload but only a sql >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> string . >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> do u had more suggestion to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> improve for sql mode >>>>>>> >>>>> >> >> > >> or it >>>>>>> >>>>> >> >> > >> > > is >>>>>>> >>>>> >> >> > >> > > > > > only a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> switch for api programme? >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> best >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> bai jj >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> Yang Wang <danrtsey...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> 于2019年12月18日周三 >>>>>>> >>>>> >> >> > >> 下午7:21写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> I just want to revive this >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> discussion. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Recently, i am thinking about how >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> to natively run >>>>>>> >>>>> >> >> > >> flink >>>>>>> >>>>> >> >> > >> > > > > per-job >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> cluster on >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Kubernetes. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes is >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> very different >>>>>>> >>>>> >> >> > >> from on >>>>>>> >>>>> >> >> > >> > > > Yarn. >>>>>>> >>>>> >> >> > >> > > > > > And >>>>>>> >>>>> >> >> > >> > > > > > > >> we >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> will >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> have >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> the same deployment requirements >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> to the client and >>>>>>> >>>>> >> >> > >> entry >>>>>>> >>>>> >> >> > >> > > > > point. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 1. Flink client not always need a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> local jar to start >>>>>>> >>>>> >> >> > >> a >>>>>>> >>>>> >> >> > >> > > Flink >>>>>>> >>>>> >> >> > >> > > > > > > per-job >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster. We could >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> support multiple schemas. For >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> example, >>>>>>> >>>>> >> >> > >> > > > file:///path/of/my.jar >>>>>>> >>>>> >> >> > >> > > > > > > means >>>>>>> >>>>> >> >> > >> > > > > > > >> a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> jar >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> located >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at client side, >>>>>>> >>>>> >> >> > >> hdfs://myhdfs/user/myname/flink/my.jar >>>>>>> >>>>> >> >> > >> > > > means a >>>>>>> >>>>> >> >> > >> > > > > > jar >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> located >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> remote hdfs, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> local:///path/in/image/my.jar >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> means a >>>>>>> >>>>> >> >> > >> jar >>>>>>> >>>>> >> >> > >> > > > located >>>>>>> >>>>> >> >> > >> > > > > > at >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> jobmanager side. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 2. Support running user program >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> on master side. This >>>>>>> >>>>> >> >> > >> also >>>>>>> >>>>> >> >> > >> > > > > means >>>>>>> >>>>> >> >> > >> > > > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> entry >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> point >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> will generate the job graph on >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> master side. We could >>>>>>> >>>>> >> >> > >> use >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> or start a local Flink client to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> achieve this >>>>>>> >>>>> >> >> > >> purpose. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> you think this is the >>>>>>> >>>>> >> >> > >> right >>>>>>> >>>>> >> >> > >> > > > > > > >> direction we >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to work? >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> tison <wander4...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 于2019年12月12日周四 >>>>>>> >>>>> >> >> > >> 下午4:48写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> A quick idea is that we separate >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> the deployment >>>>>>> >>>>> >> >> > >> from user >>>>>>> >>>>> >> >> > >> > > > > > program >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> that >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> it >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> has always been done >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> outside the program. On user >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> program executed there >>>>>>> >>>>> >> >> > >> is >>>>>>> >>>>> >> >> > >> > > > > always a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> ClusterClient that communicates >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> with >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> an existing cluster, remote or >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> local. It will be >>>>>>> >>>>> >> >> > >> another >>>>>>> >>>>> >> >> > >> > > > > thread >>>>>>> >>>>> >> >> > >> > > > > > > so >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> just >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> for >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> your information. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> Best, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison <wander4...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> 于2019年12月12日周四 >>>>>>> >>>>> >> >> > >> 下午4:40写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Hi Peter, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Another concern I realized >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> recently is that with >>>>>>> >>>>> >> >> > >> current >>>>>>> >>>>> >> >> > >> > > > > > > Executors >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73) >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> I'm afraid that user program is >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> designed to ALWAYS >>>>>>> >>>>> >> >> > >> run >>>>>>> >>>>> >> >> > >> > > on >>>>>>> >>>>> >> >> > >> > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> client >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> side. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Specifically, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> we deploy the job in executor >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> when env.execute >>>>>>> >>>>> >> >> > >> called. >>>>>>> >>>>> >> >> > >> > > > This >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> abstraction >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> possibly prevents >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Flink runs user program on the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> cluster side. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> For your proposal, in this case >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> we already >>>>>>> >>>>> >> >> > >> compiled the >>>>>>> >>>>> >> >> > >> > > > > > program >>>>>>> >>>>> >> >> > >> > > > > > > >> and >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> run >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> on >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> the client side, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> retrieve job graph >>>>>>> >>>>> >> >> > >> from >>>>>>> >>>>> >> >> > >> > > > program >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> metadata, it >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> doesn't make >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> many sense. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> you think about this >>>>>>> >>>>> >> >> > >> > > > > constraint? >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Best, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> tison. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> <huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> 于2019年12月10日周二 >>>>>>> >>>>> >> >> > >> > > > > > > >> 下午12:45写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Hi Tison, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think I >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> made the wrong >>>>>>> >>>>> >> >> > >> argument >>>>>>> >>>>> >> >> > >> > > in >>>>>>> >>>>> >> >> > >> > > > > the >>>>>>> >>>>> >> >> > >> > > > > > > doc. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> problem is only for >>>>>>> >>>>> >> >> > >> > > platform >>>>>>> >>>>> >> >> > >> > > > > > > users. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> In >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> our >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> internal deploy service, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> we further optimized the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> deployment latency by >>>>>>> >>>>> >> >> > >> letting >>>>>>> >>>>> >> >> > >> > > > > users >>>>>>> >>>>> >> >> > >> > > > > > to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> packaging >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> flink-runtime together with >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the uber jar, so that >>>>>>> >>>>> >> >> > >> we >>>>>>> >>>>> >> >> > >> > > > don't >>>>>>> >>>>> >> >> > >> > > > > > need >>>>>>> >>>>> >> >> > >> > > > > > > >> to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> consider >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> multiple flink version >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> support for now. In the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> session client mode, as >>>>>>> >>>>> >> >> > >> Flink >>>>>>> >>>>> >> >> > >> > > > libs >>>>>>> >>>>> >> >> > >> > > > > > will >>>>>>> >>>>> >> >> > >> > > > > > > >> be >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> shipped >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> anyway as local resources of >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> yarn. Users actually >>>>>>> >>>>> >> >> > >> don't >>>>>>> >>>>> >> >> > >> > > > > need >>>>>>> >>>>> >> >> > >> > > > > > to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> package >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> those libs into job jar. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Best Regards >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35 PM >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> tison < >>>>>>> >>>>> >> >> > >> > > > wander4...@gmail.com >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> the package? Do users >>>>>>> >>>>> >> >> > >> need >>>>>>> >>>>> >> >> > >> > > to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> flink-optimizer, >>>>>>> >>>>> >> >> > >> flink-table >>>>>>> >>>>> >> >> > >> > > > > codes? >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> The answer should be no >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> because they exist in >>>>>>> >>>>> >> >> > >> system >>>>>>> >>>>> >> >> > >> > > > > > > classpath. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Best, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> tison. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Yang Wang >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> <danrtsey...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> 于2019年12月10日周二 >>>>>>> >>>>> >> >> > >> > > > > 下午12:18写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> this discussion. I >>>>>>> >>>>> >> >> > >> think >>>>>>> >>>>> >> >> > >> > > this >>>>>>> >>>>> >> >> > >> > > > > is >>>>>>> >>>>> >> >> > >> > > > > > a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> very >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> useful >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> feature. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> focused on flink on >>>>>>> >>>>> >> >> > >> > > Kubernetes >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> integration >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> and >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> come >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> across the same >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> job graph generated >>>>>>> >>>>> >> >> > >> on >>>>>>> >>>>> >> >> > >> > > > client >>>>>>> >>>>> >> >> > >> > > > > > > side. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Instead, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> user jars are built in >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> the job manager >>>>>>> >>>>> >> >> > >> launched, >>>>>>> >>>>> >> >> > >> > > we >>>>>>> >>>>> >> >> > >> > > > > > just >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> generate the job graph >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> based on local user jars. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> I have some small suggestion >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> about this. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 1. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> `ProgramJobGraphRetriever` >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> is very similar to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> `ClasspathJobGraphRetriever`, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> the differences >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> are the former needs >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> `ProgramMetadata` and the >>>>>>> >>>>> >> >> > >> latter >>>>>>> >>>>> >> >> > >> > > > > needs >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> some >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> arguments. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Is it possible to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> have an unified >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> `JobGraphRetriever` to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> support >>>>>>> >>>>> >> >> > >> both? >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not use >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> a local user jar to >>>>>>> >>>>> >> >> > >> > > start >>>>>>> >>>>> >> >> > >> > > > a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> per-job >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> cluster? >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> In your case, the user jars >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> has >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already and >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> we do need to >>>>>>> >>>>> >> >> > >> download >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > jars >>>>>>> >>>>> >> >> > >> > > > > > > to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> deployer >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> always need a local user jar >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> to start a flink >>>>>>> >>>>> >> >> > >> > > cluster. >>>>>>> >>>>> >> >> > >> > > > It >>>>>>> >>>>> >> >> > >> > > > > > is >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> be >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> great >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> if >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> we >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> could support remote user >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> jars. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, we >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>> assume users package >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> flink-clients, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, flink-table >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> together within >>>>>>> >>>>> >> >> > >> the job >>>>>>> >>>>> >> >> > >> > > > jar. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Otherwise, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> job graph generation within >>>>>>> >>>>> >> >> > >> JobClusterEntryPoint will >>>>>>> >>>>> >> >> > >> > > > > fail. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> the package? Do users >>>>>>> >>>>> >> >> > >> need >>>>>>> >>>>> >> >> > >> > > to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, >>>>>>> >>>>> >> >> > >> flink-table >>>>>>> >>>>> >> >> > >> > > > > > codes? >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Best, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Yang >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> <huangzhenqiu0...@gmail.com> >>>>>>> >>>>> >> >> > >> > > > 于2019年12月10日周二 >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 上午2:37写道: >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Dear All, >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> community starts to >>>>>>> >>>>> >> >> > >> improve the >>>>>>> >>>>> >> >> > >> > > > yarn >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> descriptor >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and config >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> files configurable >>>>>>> >>>>> >> >> > >> from >>>>>>> >>>>> >> >> > >> > > > CLI. >>>>>>> >>>>> >> >> > >> > > > > It >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> improves >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> flexibility of Flink >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> deployment Yarn Per Job >>>>>>> >>>>> >> >> > >> Mode. >>>>>>> >>>>> >> >> > >> > > > For >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> platform >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> users >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> who >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds of >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> streaming pipelines >>>>>>> >>>>> >> >> > >> for >>>>>>> >>>>> >> >> > >> > > the >>>>>>> >>>>> >> >> > >> > > > > > whole >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> org >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> or >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> graph generation in >>>>>>> >>>>> >> >> > >> > > > > client-side >>>>>>> >>>>> >> >> > >> > > > > > is >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> another >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> propose a >>>>>>> >>>>> >> >> > >> configurable >>>>>>> >>>>> >> >> > >> > > > > feature >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> for >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> feature can allow >>>>>>> >>>>> >> >> > >> users to >>>>>>> >>>>> >> >> > >> > > > > choose >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> the >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> job >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> graph >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> ClusterEntryPoint so that >>>>>>> >>>>> >> >> > >> the >>>>>>> >>>>> >> >> > >> > > job >>>>>>> >>>>> >> >> > >> > > > > jar >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> doesn't >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> need >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> to >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> graph generation. The >>>>>>> >>>>> >> >> > >> > > proposal >>>>>>> >>>>> >> >> > >> > > > is >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> organized >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> as a >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FLIP >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> . >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> suggestions are welcomed. >>>>>>> >>>>> >> >> > >> Thank >>>>>>> >>>>> >> >> > >> > > you >>>>>>> >>>>> >> >> > >> > > > in >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> advance. >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >> >> > >> > > > > > > >>>>>>> >>>>> >> >> > >> > > > > > >>>>>>> >>>>> >> >> > >> > > > > >>>>>>> >>>>> >> >> > >> > > > >>>>>>> >>>>> >> >> > >> > > >>>>>>> >>>>> >> >> > >> >>>>>>> >>>>> >> >> > > >>>>>>> >>>>> >> >>