Hi Becket, Thanks for your suggestion. We will update the FLIP to add/enrich the following parts. * User cli option change, use "-R/--remote" to apply the cluster deploy mode * Configuration change, how to specify remote user jars and dependencies * The whole story about how "application mode" works, upload -> fetch -> submit job * The cluster lifecycle, when and how the Flink cluster is destroyed
Best, Yang Becket Qin <becket....@gmail.com> 于2020年3月9日周一 下午12:34写道: > Thanks for the reply, tison and Yang, > > Regarding the public interface, is "-R/--remote" option the only change? > Will the users also need to provide a remote location to upload and store > the jars, and a list of jars as dependencies to be uploaded? > > It would be important that the public interface section in the FLIP > includes all the user sensible changes including the CLI / configuration / > metrics, etc. Can we update the FLIP to include the conclusion we have here > in the ML? > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Mar 9, 2020 at 11:59 AM Yang Wang <danrtsey...@gmail.com> wrote: > >> Hi Becket, >> >> Thanks for jumping out and sharing your concerns. I second tison's answer >> and just >> make some additions. >> >> >> > job submission interface >> >> This FLIP will introduce an interface for running user `main()` on >> cluster, named as >> “ProgramDeployer”. However, it is not a public interface. It will be used >> in `CliFrontend` >> when the remote deploy option(-R/--remote-deploy) is specified. So the >> only changes >> on user side is about the cli option. >> >> >> > How to fetch the jars? >> >> The “local path” and “dfs path“ could be supported to fetch the user jars >> and dependencies. >> Just like tison has said, we could ship the user jar and dependencies >> from client side to >> HDFS and use the entrypoint to fetch. >> >> Also we have some other practical ways to use the new “application mode“. >> 1. Upload the user jars and dependencies to the DFS(e.g. HDFS, S3, Aliyun >> OSS) manually >> or some external deployer system. For K8s, the user jars and dependencies >> could also be >> built in the docker image. >> 2. Specify the remote/local user jar and dependencies in `flink run`. >> Usually this could also >> be done by the external deployer system. >> 3. When the `ClusterEntrypoint` is launched, it will fetch the jars and >> files automatically. We >> do not need any specific fetcher implementation. Since we could leverage >> flink `FileSystem` >> to do this. >> >> >> >> >> >> Best, >> Yang >> >> tison <wander4...@gmail.com> 于2020年3月9日周一 上午11:34写道: >> >>> Hi Becket, >>> >>> Thanks for your attention on FLIP-85! I answered your question inline. >>> >>> 1. What exactly the job submission interface will look like after this >>> FLIP? The FLIP template has a Public Interface section but was removed from >>> this FLIP. >>> >>> As Yang mentioned in this thread above: >>> >>> From user perspective, only a `-R/-- remote-deploy` cli option is >>> visible. They are not aware of the application mode. >>> >>> 2. How will the new ClusterEntrypoint fetch the jars from external >>> storage? What external storage will be supported out of the box? Will this >>> "jar fetcher" be pluggable? If so, how does the API look like and how will >>> users specify the custom "jar fetcher"? >>> >>> It depends actually. Here are several points: >>> >>> i. Currently, shipping user files is handled by Flink, dependencies >>> fetching can be handled by Flink. >>> ii. Current, we only support local file system shipfiles. When in >>> Application Mode, to support meaningful jar fetch we should support user to >>> configure richer shipfiles schema at first. >>> iii. Dependencies fetching varies from deployments. That is, on YARN, >>> its convention is through HDFS; on Kubernetes, its convention is configured >>> resource server and fetched by initContainer. >>> >>> Thus, in the First phase of Application Mode dependencies fetching is >>> totally handled within Flink. >>> >>> 3. It sounds that in this FLIP, the "session cluster" running the >>> application has the same lifecycle as the user application. How will the >>> session cluster be teared down after the application finishes? Will the >>> ClusterEntrypoint do that? Will there be an option of not tearing the >>> cluster down? >>> >>> The precondition we tear down the cluster is *both* >>> >>> i. user main reached to its end >>> ii. all jobs submitted(current, at most one) reached global terminate >>> state >>> >>> For the "how", it is an implementation topic, but conceptually it is >>> ClusterEntrypoint's responsibility. >>> >>> >Will there be an option of not tearing the cluster down? >>> >>> I think the answer is "No" because the cluster is designed to be bounded >>> with an Application. User logic that communicates with the job is always in >>> its `main`, and for history information we have history server. >>> >>> Best, >>> tison. >>> >>> >>> Becket Qin <becket....@gmail.com> 于2020年3月9日周一 上午8:12写道: >>> >>>> Hi Peter and Kostas, >>>> >>>> Thanks for creating this FLIP. Moving the JobGraph compilation to the >>>> cluster makes a lot of sense to me. FLIP-40 had the exactly same idea, but >>>> is currently dormant and can probably be superseded by this FLIP. After >>>> reading the FLIP, I still have a few questions. >>>> >>>> 1. What exactly the job submission interface will look like after this >>>> FLIP? The FLIP template has a Public Interface section but was removed from >>>> this FLIP. >>>> 2. How will the new ClusterEntrypoint fetch the jars from external >>>> storage? What external storage will be supported out of the box? Will this >>>> "jar fetcher" be pluggable? If so, how does the API look like and how will >>>> users specify the custom "jar fetcher"? >>>> 3. It sounds that in this FLIP, the "session cluster" running the >>>> application has the same lifecycle as the user application. How will the >>>> session cluster be teared down after the application finishes? Will the >>>> ClusterEntrypoint do that? Will there be an option of not tearing the >>>> cluster down? >>>> >>>> Maybe they have been discussed in the ML earlier, but I think they >>>> should be part of the FLIP also. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Thu, Mar 5, 2020 at 10:09 PM Kostas Kloudas <kklou...@gmail.com> >>>> wrote: >>>> >>>>> Also from my side +1 to start voting. >>>>> >>>>> Cheers, >>>>> Kostas >>>>> >>>>> On Thu, Mar 5, 2020 at 7:45 AM tison <wander4...@gmail.com> wrote: >>>>> > >>>>> > +1 to star voting. >>>>> > >>>>> > Best, >>>>> > tison. >>>>> > >>>>> > >>>>> > Yang Wang <danrtsey...@gmail.com> 于2020年3月5日周四 下午2:29写道: >>>>> >> >>>>> >> Hi Peter, >>>>> >> Really thanks for your response. >>>>> >> >>>>> >> Hi all @Kostas Kloudas @Zili Chen @Peter Huang @Rong Rong >>>>> >> It seems that we have reached an agreement. The “application mode” >>>>> is regarded as the enhanced “per-job”. It is >>>>> >> orthogonal with “cluster deploy”. Currently, we bind the “per-job” >>>>> to `run-user-main-on-client` and “application mode” >>>>> >> to `run-user-main-on-cluster`. >>>>> >> >>>>> >> Do you have other concerns to moving FLIP-85 to voting? >>>>> >> >>>>> >> >>>>> >> Best, >>>>> >> Yang >>>>> >> >>>>> >> Peter Huang <huangzhenqiu0...@gmail.com> 于2020年3月5日周四 下午12:48写道: >>>>> >>> >>>>> >>> Hi Yang and Kostas, >>>>> >>> >>>>> >>> Thanks for the clarification. It makes more sense to me if the >>>>> long term goal is to replace per job mode to application mode >>>>> >>> in the future (at the time that multiple execute can be >>>>> supported). Before that, It will be better to keep the concept of >>>>> >>> application mode internally. As Yang suggested, User only need to >>>>> use a `-R/-- remote-deploy` cli option to launch >>>>> >>> a per job cluster with the main function executed in cluster >>>>> entry-point. +1 for the execution plan. >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> Best Regards >>>>> >>> Peter Huang >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Tue, Mar 3, 2020 at 7:11 AM Yang Wang <danrtsey...@gmail.com> >>>>> wrote: >>>>> >>>> >>>>> >>>> Hi Peter, >>>>> >>>> >>>>> >>>> Having the application mode does not mean we will drop the >>>>> cluster-deploy >>>>> >>>> option. I just want to share some thoughts about “Application >>>>> Mode”. >>>>> >>>> >>>>> >>>> >>>>> >>>> 1. The application mode could cover the per-job sematic. Its >>>>> lifecyle is bound >>>>> >>>> to the user `main()`. And all the jobs in the user main will be >>>>> executed in a same >>>>> >>>> Flink cluster. In first phase of FLIP-85 implementation, running >>>>> user main on the >>>>> >>>> cluster side could be supported in application mode. >>>>> >>>> >>>>> >>>> 2. Maybe in the future, we also need to support multiple >>>>> `execute()` on client side >>>>> >>>> in a same Flink cluster. Then the per-job mode will evolve to >>>>> application mode. >>>>> >>>> >>>>> >>>> 3. From user perspective, only a `-R/-- remote-deploy` cli option >>>>> is visible. They >>>>> >>>> are not aware of the application mode. >>>>> >>>> >>>>> >>>> 4. In the first phase, the application mode is working as >>>>> “per-job”(only one job in >>>>> >>>> the user main). We just leave more potential for the future. >>>>> >>>> >>>>> >>>> >>>>> >>>> I am not against with calling it “cluster deploy mode” if you all >>>>> think it is clearer for users. >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> Best, >>>>> >>>> Yang >>>>> >>>> >>>>> >>>> Kostas Kloudas <kklou...@gmail.com> 于2020年3月3日周二 下午6:49写道: >>>>> >>>>> >>>>> >>>>> Hi Peter, >>>>> >>>>> >>>>> >>>>> I understand your point. This is why I was also a bit torn about >>>>> the >>>>> >>>>> name and my proposal was a bit aligned with yours (something >>>>> along the >>>>> >>>>> lines of "cluster deploy" mode). >>>>> >>>>> >>>>> >>>>> But many of the other participants in the discussion suggested >>>>> the >>>>> >>>>> "Application Mode". I think that the reasoning is that now the >>>>> user's >>>>> >>>>> Application is more self-contained. >>>>> >>>>> It will be submitted to the cluster and the user can just >>>>> disconnect. >>>>> >>>>> In addition, as discussed briefly in the doc, in the future >>>>> there may >>>>> >>>>> be better support for multi-execute applications which will >>>>> bring us >>>>> >>>>> one step closer to the true "Application Mode". But this is how I >>>>> >>>>> interpreted their arguments, of course they can also express >>>>> their >>>>> >>>>> thoughts on the topic :) >>>>> >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Kostas >>>>> >>>>> >>>>> >>>>> On Mon, Mar 2, 2020 at 6:15 PM Peter Huang < >>>>> huangzhenqiu0...@gmail.com> wrote: >>>>> >>>>> > >>>>> >>>>> > Hi Kostas, >>>>> >>>>> > >>>>> >>>>> > Thanks for updating the wiki. We have aligned with the >>>>> implementations in the doc. But I feel it is still a little bit confusing >>>>> of the naming from a user's perspective. It is well known that Flink >>>>> support per job cluster and session cluster. The concept is in the layer >>>>> of >>>>> how a job is managed within Flink. The method introduced util now is a >>>>> kind >>>>> of mixing job and session cluster to promising the implementation >>>>> complexity. We probably don't need to label it as Application Model as the >>>>> same layer of per job cluster and session cluster. Conceptually, I think >>>>> it >>>>> is still a cluster mode implementation for per job cluster. >>>>> >>>>> > >>>>> >>>>> > To minimize the confusion of users, I think it would be better >>>>> just an option of per job cluster for each type of cluster manager. How do >>>>> you think? >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > Best Regards >>>>> >>>>> > Peter Huang >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas < >>>>> kklou...@gmail.com> wrote: >>>>> >>>>> >> >>>>> >>>>> >> Hi Yang, >>>>> >>>>> >> >>>>> >>>>> >> The difference between per-job and application mode is that, >>>>> as you >>>>> >>>>> >> described, in the per-job mode the main is executed on the >>>>> client >>>>> >>>>> >> while in the application mode, the main is executed on the >>>>> cluster. >>>>> >>>>> >> I do not think we have to offer "application mode" with >>>>> running the >>>>> >>>>> >> main on the client side as this is exactly what the per-job >>>>> mode does >>>>> >>>>> >> currently and, as you described also, it would be redundant. >>>>> >>>>> >> >>>>> >>>>> >> Sorry if this was not clear in the document. >>>>> >>>>> >> >>>>> >>>>> >> Cheers, >>>>> >>>>> >> Kostas >>>>> >>>>> >> >>>>> >>>>> >> On Mon, Mar 2, 2020 at 3:17 PM Yang Wang < >>>>> danrtsey...@gmail.com> wrote: >>>>> >>>>> >> > >>>>> >>>>> >> > Hi Kostas, >>>>> >>>>> >> > >>>>> >>>>> >> > Thanks a lot for your conclusion and updating the FLIP-85 >>>>> WIKI. Currently, i have no more >>>>> >>>>> >> > questions about motivation, approach, fault tolerance and >>>>> the first phase implementation. >>>>> >>>>> >> > >>>>> >>>>> >> > I think the new title "Flink Application Mode" makes a lot >>>>> senses to me. Especially for the >>>>> >>>>> >> > containerized environment, the cluster deploy option will >>>>> be very useful. >>>>> >>>>> >> > >>>>> >>>>> >> > Just one concern, how do we introduce this new application >>>>> mode to our users? >>>>> >>>>> >> > Each user program(i.e. `main()`) is an application. >>>>> Currently, we intend to only support one >>>>> >>>>> >> > `execute()`. So what's the difference between per-job and >>>>> application mode? >>>>> >>>>> >> > >>>>> >>>>> >> > For per-job, user `main()` is always executed on client >>>>> side. And For application mode, user >>>>> >>>>> >> > `main()` could be executed on client or master >>>>> side(configured via cli option). >>>>> >>>>> >> > Right? We need to have a clear concept. Otherwise, the >>>>> users will be more and more confusing. >>>>> >>>>> >> > >>>>> >>>>> >> > >>>>> >>>>> >> > Best, >>>>> >>>>> >> > Yang >>>>> >>>>> >> > >>>>> >>>>> >> > Kostas Kloudas <kklou...@gmail.com> 于2020年3月2日周一 下午5:58写道: >>>>> >>>>> >> >> >>>>> >>>>> >> >> Hi all, >>>>> >>>>> >> >> >>>>> >>>>> >> >> I update >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode >>>>> >>>>> >> >> based on the discussion we had here: >>>>> >>>>> >> >> >>>>> >>>>> >> >> >>>>> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit# >>>>> >>>>> >> >> >>>>> >>>>> >> >> Please let me know what you think and please keep the >>>>> discussion in the ML :) >>>>> >>>>> >> >> >>>>> >>>>> >> >> Thanks for starting the discussion and I hope that soon we >>>>> will be >>>>> >>>>> >> >> able to vote on the FLIP. >>>>> >>>>> >> >> >>>>> >>>>> >> >> Cheers, >>>>> >>>>> >> >> Kostas >>>>> >>>>> >> >> >>>>> >>>>> >> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang < >>>>> danrtsey...@gmail.com> wrote: >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > Hi all, >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > Thanks a lot for the feedback from @Kostas Kloudas. Your >>>>> all concerns are >>>>> >>>>> >> >> > on point. The FLIP-85 is mainly >>>>> >>>>> >> >> > focused on supporting cluster mode for per-job. Since it >>>>> is more urgent and >>>>> >>>>> >> >> > have much more use >>>>> >>>>> >> >> > cases both in Yarn and Kubernetes deployment. For >>>>> session cluster, we could >>>>> >>>>> >> >> > have more discussion >>>>> >>>>> >> >> > in a new thread later. >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > #1, How to download the user jars and dependencies for >>>>> per-job in cluster >>>>> >>>>> >> >> > mode? >>>>> >>>>> >> >> > For Yarn, we could register the user jars and >>>>> dependencies as >>>>> >>>>> >> >> > LocalResource. They will be distributed >>>>> >>>>> >> >> > by Yarn. And once the JobManager and TaskManager >>>>> launched, the jars are >>>>> >>>>> >> >> > already exists. >>>>> >>>>> >> >> > For Standalone per-job and K8s, we expect that the user >>>>> jars >>>>> >>>>> >> >> > and dependencies are built into the image. >>>>> >>>>> >> >> > Or the InitContainer could be used for downloading. It >>>>> is natively >>>>> >>>>> >> >> > distributed and we will not have bottleneck. >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > #2, Job graph recovery >>>>> >>>>> >> >> > We could have an optimization to store job graph on the >>>>> DFS. However, i >>>>> >>>>> >> >> > suggest building a new jobgraph >>>>> >>>>> >> >> > from the configuration is the default option. Since we >>>>> will not always have >>>>> >>>>> >> >> > a DFS store when deploying a >>>>> >>>>> >> >> > Flink per-job cluster. Of course, we assume that using >>>>> the same >>>>> >>>>> >> >> > configuration(e.g. job_id, user_jar, main_class, >>>>> >>>>> >> >> > main_args, parallelism, savepoint_settings, etc.) will >>>>> get a same job >>>>> >>>>> >> >> > graph. I think the standalone per-job >>>>> >>>>> >> >> > already has the similar behavior. >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > #3, What happens with jobs that have multiple execute >>>>> calls? >>>>> >>>>> >> >> > Currently, it is really a problem. Even we use a local >>>>> client on Flink >>>>> >>>>> >> >> > master side, it will have different behavior with >>>>> >>>>> >> >> > client mode. For client mode, if we execute multiple >>>>> times, then we will >>>>> >>>>> >> >> > deploy multiple Flink clusters for each execute. >>>>> >>>>> >> >> > I am not pretty sure whether it is reasonable. However, >>>>> i still think using >>>>> >>>>> >> >> > the local client is a good choice. We could >>>>> >>>>> >> >> > continue the discussion in a new thread. @Zili Chen < >>>>> wander4...@gmail.com> Do >>>>> >>>>> >> >> > you want to drive this? >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > Best, >>>>> >>>>> >> >> > Yang >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月16日周四 >>>>> 上午1:55写道: >>>>> >>>>> >> >> > >>>>> >>>>> >> >> > > Hi Kostas, >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > Thanks for this feedback. I can't agree more about the >>>>> opinion. The >>>>> >>>>> >> >> > > cluster mode should be added >>>>> >>>>> >> >> > > first in per job cluster. >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > 1) For job cluster implementation >>>>> >>>>> >> >> > > 1. Job graph recovery from configuration or store as >>>>> static job graph as >>>>> >>>>> >> >> > > session cluster. I think the static one will be better >>>>> for less recovery >>>>> >>>>> >> >> > > time. >>>>> >>>>> >> >> > > Let me update the doc for details. >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > 2. For job execute multiple times, I think @Zili Chen >>>>> >>>>> >> >> > > <wander4...@gmail.com> has proposed the local client >>>>> solution that can >>>>> >>>>> >> >> > > the run program actually in the cluster entry point. >>>>> We can put the >>>>> >>>>> >> >> > > implementation in the second stage, >>>>> >>>>> >> >> > > or even a new FLIP for further discussion. >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > 2) For session cluster implementation >>>>> >>>>> >> >> > > We can disable the cluster mode for the session >>>>> cluster in the first >>>>> >>>>> >> >> > > stage. I agree the jar downloading will be a painful >>>>> thing. >>>>> >>>>> >> >> > > We can consider about PoC and performance evaluation >>>>> first. If the end to >>>>> >>>>> >> >> > > end experience is good enough, then we can consider >>>>> >>>>> >> >> > > proceeding with the solution. >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > Looking forward to more opinions from @Yang Wang < >>>>> danrtsey...@gmail.com> @Zili >>>>> >>>>> >> >> > > Chen <wander4...@gmail.com> @Dian Fu < >>>>> dian0511...@gmail.com>. >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > Best Regards >>>>> >>>>> >> >> > > Peter Huang >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas < >>>>> kklou...@gmail.com> wrote: >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> > >> Hi all, >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > >> I am writing here as the discussion on the Google Doc >>>>> seems to be a >>>>> >>>>> >> >> > >> bit difficult to follow. >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > >> I think that in order to be able to make progress, it >>>>> would be helpful >>>>> >>>>> >> >> > >> to focus on per-job mode for now. >>>>> >>>>> >> >> > >> The reason is that: >>>>> >>>>> >> >> > >> 1) making the (unique) JobSubmitHandler responsible >>>>> for creating the >>>>> >>>>> >> >> > >> jobgraphs, >>>>> >>>>> >> >> > >> which includes downloading dependencies, is not an >>>>> optimal solution >>>>> >>>>> >> >> > >> 2) even if we put the responsibility on the >>>>> JobMaster, currently each >>>>> >>>>> >> >> > >> job has its own >>>>> >>>>> >> >> > >> JobMaster but they all run on the same process, so >>>>> we have again a >>>>> >>>>> >> >> > >> single entity. >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > >> Of course after this is done, and if we feel >>>>> comfortable with the >>>>> >>>>> >> >> > >> solution, then we can go to the session mode. >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > >> A second comment has to do with fault-tolerance in >>>>> the per-job, >>>>> >>>>> >> >> > >> cluster-deploy mode. >>>>> >>>>> >> >> > >> In the document, it is suggested that upon recovery, >>>>> the JobMaster of >>>>> >>>>> >> >> > >> each job re-creates the JobGraph. >>>>> >>>>> >> >> > >> I am just wondering if it is better to create and >>>>> store the jobGraph >>>>> >>>>> >> >> > >> upon submission and only fetch it >>>>> >>>>> >> >> > >> upon recovery so that we have a static jobGraph. >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > >> Finally, I have a question which is what happens with >>>>> jobs that have >>>>> >>>>> >> >> > >> multiple execute calls? >>>>> >>>>> >> >> > >> The semantics seem to change compared to the current >>>>> behaviour, right? >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > >> Cheers, >>>>> >>>>> >> >> > >> Kostas >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison < >>>>> wander4...@gmail.com> wrote: >>>>> >>>>> >> >> > >> > >>>>> >>>>> >> >> > >> > not always, Yang Wang is also not yet a committer >>>>> but he can join the >>>>> >>>>> >> >> > >> > channel. I cannot find the id by clicking “Add new >>>>> member in channel” so >>>>> >>>>> >> >> > >> > come to you and ask for try out the link. Possibly >>>>> I will find other >>>>> >>>>> >> >> > >> ways >>>>> >>>>> >> >> > >> > but the original purpose is that the slack channel >>>>> is a public area we >>>>> >>>>> >> >> > >> > discuss about developing... >>>>> >>>>> >> >> > >> > Best, >>>>> >>>>> >> >> > >> > tison. >>>>> >>>>> >> >> > >> > >>>>> >>>>> >> >> > >> > >>>>> >>>>> >> >> > >> > Peter Huang <huangzhenqiu0...@gmail.com> >>>>> 于2020年1月9日周四 上午2:44写道: >>>>> >>>>> >> >> > >> > >>>>> >>>>> >> >> > >> > > Hi Tison, >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> > > I am not the committer of Flink yet. I think I >>>>> can't join it also. >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> > > Best Regards >>>>> >>>>> >> >> > >> > > Peter Huang >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison < >>>>> wander4...@gmail.com> wrote: >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> > > > Hi Peter, >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > > Could you try out this link? >>>>> >>>>> >> >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > > Best, >>>>> >>>>> >> >> > >> > > > tison. >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>>> 于2020年1月9日周四 上午1:22写道: >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > > > Hi Tison, >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > > I can't join the group with shared link. >>>>> Would you please add me >>>>> >>>>> >> >> > >> into >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > group? My slack account is huangzhenqiu0825. >>>>> >>>>> >> >> > >> > > > > Thank you in advance. >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > > Best Regards >>>>> >>>>> >> >> > >> > > > > Peter Huang >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison < >>>>> wander4...@gmail.com> >>>>> >>>>> >> >> > >> wrote: >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > > > Hi Peter, >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > As described above, this effort should get >>>>> attention from people >>>>> >>>>> >> >> > >> > > > > developing >>>>> >>>>> >> >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I >>>>> recommend you to join >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > > public >>>>> >>>>> >> >> > >> > > > > > slack channel[1] for Flink Client API >>>>> Enhancement and you can >>>>> >>>>> >> >> > >> try to >>>>> >>>>> >> >> > >> > > > > share >>>>> >>>>> >> >> > >> > > > > > you detailed thoughts there. It possibly >>>>> gets more concrete >>>>> >>>>> >> >> > >> > > attentions. >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > Best, >>>>> >>>>> >> >> > >> > > > > > tison. >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > [1] >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> >>>>> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>>> 于2020年1月7日周二 上午5:09写道: >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > > Dear All, >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > Happy new year! According to existing >>>>> feedback from the >>>>> >>>>> >> >> > >> community, >>>>> >>>>> >> >> > >> > > we >>>>> >>>>> >> >> > >> > > > > > > revised the doc with the consideration of >>>>> session cluster >>>>> >>>>> >> >> > >> support, >>>>> >>>>> >> >> > >> > > > and >>>>> >>>>> >> >> > >> > > > > > > concrete interface changes needed and >>>>> execution plan. Please >>>>> >>>>> >> >> > >> take >>>>> >>>>> >> >> > >> > > one >>>>> >>>>> >> >> > >> > > > > > more >>>>> >>>>> >> >> > >> > > > > > > round of review at your most convenient >>>>> time. >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> >>>>> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit# >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > Best Regards >>>>> >>>>> >> >> > >> > > > > > > Peter Huang >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter >>>>> Huang < >>>>> >>>>> >> >> > >> > > > > huangzhenqiu0...@gmail.com> >>>>> >>>>> >> >> > >> > > > > > > wrote: >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > Hi Dian, >>>>> >>>>> >> >> > >> > > > > > > > Thanks for giving us valuable feedbacks. >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > 1) It's better to have a whole design >>>>> for this feature >>>>> >>>>> >> >> > >> > > > > > > > For the suggestion of enabling the >>>>> cluster mode also session >>>>> >>>>> >> >> > >> > > > > cluster, I >>>>> >>>>> >> >> > >> > > > > > > > think Flink already supported it. >>>>> WebSubmissionExtension >>>>> >>>>> >> >> > >> already >>>>> >>>>> >> >> > >> > > > > allows >>>>> >>>>> >> >> > >> > > > > > > > users to start a job with the specified >>>>> jar by using web UI. >>>>> >>>>> >> >> > >> > > > > > > > But we need to enable the feature from >>>>> CLI for both local >>>>> >>>>> >> >> > >> jar, >>>>> >>>>> >> >> > >> > > > remote >>>>> >>>>> >> >> > >> > > > > > > jar. >>>>> >>>>> >> >> > >> > > > > > > > I will align with Yang Wang first about >>>>> the details and >>>>> >>>>> >> >> > >> update >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > > design >>>>> >>>>> >> >> > >> > > > > > > > doc. >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > 2) It's better to consider the >>>>> convenience for users, such >>>>> >>>>> >> >> > >> as >>>>> >>>>> >> >> > >> > > > > debugging >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > I am wondering whether we can store the >>>>> exception in >>>>> >>>>> >> >> > >> jobgragh >>>>> >>>>> >> >> > >> > > > > > > > generation in application master. As no >>>>> streaming graph can >>>>> >>>>> >> >> > >> be >>>>> >>>>> >> >> > >> > > > > > scheduled >>>>> >>>>> >> >> > >> > > > > > > in >>>>> >>>>> >> >> > >> > > > > > > > this case, there will be no more TM >>>>> will be requested from >>>>> >>>>> >> >> > >> > > FlinkRM. >>>>> >>>>> >> >> > >> > > > > > > > If the AM is still running, users can >>>>> still query it from >>>>> >>>>> >> >> > >> CLI. As >>>>> >>>>> >> >> > >> > > > it >>>>> >>>>> >> >> > >> > > > > > > > requires more change, we can get some >>>>> feedback from < >>>>> >>>>> >> >> > >> > > > > > aljos...@apache.org >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > and @zjf...@gmail.com <zjf...@gmail.com >>>>> >. >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > 3) It's better to consider the impact >>>>> to the stability of >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > > cluster >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > I agree with Yang Wang's opinion. >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > Best Regards >>>>> >>>>> >> >> > >> > > > > > > > Peter Huang >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian Fu >>>>> < >>>>> >>>>> >> >> > >> dian0511...@gmail.com> >>>>> >>>>> >> >> > >> > > > > wrote: >>>>> >>>>> >> >> > >> > > > > > > > >>>>> >>>>> >> >> > >> > > > > > > >> Hi all, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> Sorry to jump into this discussion. >>>>> Thanks everyone for the >>>>> >>>>> >> >> > >> > > > > > discussion. >>>>> >>>>> >> >> > >> > > > > > > >> I'm very interested in this topic >>>>> although I'm not an >>>>> >>>>> >> >> > >> expert in >>>>> >>>>> >> >> > >> > > > this >>>>> >>>>> >> >> > >> > > > > > > part. >>>>> >>>>> >> >> > >> > > > > > > >> So I'm glad to share my thoughts as >>>>> following: >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> 1) It's better to have a whole design >>>>> for this feature >>>>> >>>>> >> >> > >> > > > > > > >> As we know, there are two deployment >>>>> modes: per-job mode >>>>> >>>>> >> >> > >> and >>>>> >>>>> >> >> > >> > > > session >>>>> >>>>> >> >> > >> > > > > > > >> mode. I'm wondering which mode really >>>>> needs this feature. >>>>> >>>>> >> >> > >> As the >>>>> >>>>> >> >> > >> > > > > > design >>>>> >>>>> >> >> > >> > > > > > > doc >>>>> >>>>> >> >> > >> > > > > > > >> mentioned, per-job mode is more used >>>>> for streaming jobs and >>>>> >>>>> >> >> > >> > > > session >>>>> >>>>> >> >> > >> > > > > > > mode is >>>>> >>>>> >> >> > >> > > > > > > >> usually used for batch jobs(Of course, >>>>> the job types and >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > > > > deployment >>>>> >>>>> >> >> > >> > > > > > > >> modes are orthogonal). Usually >>>>> streaming job is only >>>>> >>>>> >> >> > >> needed to >>>>> >>>>> >> >> > >> > > be >>>>> >>>>> >> >> > >> > > > > > > submitted >>>>> >>>>> >> >> > >> > > > > > > >> once and it will run for days or >>>>> weeks, while batch jobs >>>>> >>>>> >> >> > >> will be >>>>> >>>>> >> >> > >> > > > > > > submitted >>>>> >>>>> >> >> > >> > > > > > > >> more frequently compared with >>>>> streaming jobs. This means >>>>> >>>>> >> >> > >> that >>>>> >>>>> >> >> > >> > > > maybe >>>>> >>>>> >> >> > >> > > > > > > session >>>>> >>>>> >> >> > >> > > > > > > >> mode also needs this feature. However, >>>>> if we support this >>>>> >>>>> >> >> > >> > > feature >>>>> >>>>> >> >> > >> > > > in >>>>> >>>>> >> >> > >> > > > > > > >> session mode, the application master >>>>> will become the new >>>>> >>>>> >> >> > >> > > > centralized >>>>> >>>>> >> >> > >> > > > > > > >> service(which should be solved). So in >>>>> this case, it's >>>>> >>>>> >> >> > >> better to >>>>> >>>>> >> >> > >> > > > > have >>>>> >>>>> >> >> > >> > > > > > a >>>>> >>>>> >> >> > >> > > > > > > >> complete design for both per-job mode >>>>> and session mode. >>>>> >>>>> >> >> > >> > > > Furthermore, >>>>> >>>>> >> >> > >> > > > > > > even >>>>> >>>>> >> >> > >> > > > > > > >> if we can do it phase by phase, we >>>>> need to have a whole >>>>> >>>>> >> >> > >> picture >>>>> >>>>> >> >> > >> > > of >>>>> >>>>> >> >> > >> > > > > how >>>>> >>>>> >> >> > >> > > > > > > it >>>>> >>>>> >> >> > >> > > > > > > >> works in both per-job mode and session >>>>> mode. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> 2) It's better to consider the >>>>> convenience for users, such >>>>> >>>>> >> >> > >> as >>>>> >>>>> >> >> > >> > > > > > debugging >>>>> >>>>> >> >> > >> > > > > > > >> After we finish this feature, the job >>>>> graph will be >>>>> >>>>> >> >> > >> compiled in >>>>> >>>>> >> >> > >> > > > the >>>>> >>>>> >> >> > >> > > > > > > >> application master, which means that >>>>> users cannot easily >>>>> >>>>> >> >> > >> get the >>>>> >>>>> >> >> > >> > > > > > > exception >>>>> >>>>> >> >> > >> > > > > > > >> message synchorousely in the job >>>>> client if there are >>>>> >>>>> >> >> > >> problems >>>>> >>>>> >> >> > >> > > > during >>>>> >>>>> >> >> > >> > > > > > the >>>>> >>>>> >> >> > >> > > > > > > >> job graph compiling (especially for >>>>> platform users), such >>>>> >>>>> >> >> > >> as the >>>>> >>>>> >> >> > >> > > > > > > resource >>>>> >>>>> >> >> > >> > > > > > > >> path is incorrect, the user program >>>>> itself has some >>>>> >>>>> >> >> > >> problems, >>>>> >>>>> >> >> > >> > > etc. >>>>> >>>>> >> >> > >> > > > > > What >>>>> >>>>> >> >> > >> > > > > > > I'm >>>>> >>>>> >> >> > >> > > > > > > >> thinking is that maybe we should throw >>>>> the exceptions as >>>>> >>>>> >> >> > >> early >>>>> >>>>> >> >> > >> > > as >>>>> >>>>> >> >> > >> > > > > > > possible >>>>> >>>>> >> >> > >> > > > > > > >> (during job submission stage). >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> 3) It's better to consider the impact >>>>> to the stability of >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > > > cluster >>>>> >>>>> >> >> > >> > > > > > > >> If we perform the compiling in the >>>>> application master, we >>>>> >>>>> >> >> > >> should >>>>> >>>>> >> >> > >> > > > > > > consider >>>>> >>>>> >> >> > >> > > > > > > >> the impact of the compiling errors. >>>>> Although YARN could >>>>> >>>>> >> >> > >> resume >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > > > >> application master in case of >>>>> failures, but in some case >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > > > compiling >>>>> >>>>> >> >> > >> > > > > > > >> failure may be a waste of cluster >>>>> resource and may impact >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > > > > stability >>>>> >>>>> >> >> > >> > > > > > > the >>>>> >>>>> >> >> > >> > > > > > > >> cluster and the other jobs in the >>>>> cluster, such as the >>>>> >>>>> >> >> > >> resource >>>>> >>>>> >> >> > >> > > > path >>>>> >>>>> >> >> > >> > > > > > is >>>>> >>>>> >> >> > >> > > > > > > >> incorrect, the user program itself has >>>>> some problems(in >>>>> >>>>> >> >> > >> this >>>>> >>>>> >> >> > >> > > case, >>>>> >>>>> >> >> > >> > > > > job >>>>> >>>>> >> >> > >> > > > > > > >> failover cannot solve this kind of >>>>> problems) etc. In the >>>>> >>>>> >> >> > >> current >>>>> >>>>> >> >> > >> > > > > > > >> implemention, the compiling errors are >>>>> handled in the >>>>> >>>>> >> >> > >> client >>>>> >>>>> >> >> > >> > > side >>>>> >>>>> >> >> > >> > > > > and >>>>> >>>>> >> >> > >> > > > > > > there >>>>> >>>>> >> >> > >> > > > > > > >> is no impact to the cluster at all. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed >>>>> in the design doc >>>>> >>>>> >> >> > >> that >>>>> >>>>> >> >> > >> > > only >>>>> >>>>> >> >> > >> > > > > > > per-job >>>>> >>>>> >> >> > >> > > > > > > >> mode will be supported. However, I >>>>> think it's better to >>>>> >>>>> >> >> > >> also >>>>> >>>>> >> >> > >> > > > > consider >>>>> >>>>> >> >> > >> > > > > > > the >>>>> >>>>> >> >> > >> > > > > > > >> session mode in the design doc. >>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 2) and 3), I have not >>>>> seen related sections >>>>> >>>>> >> >> > >> in the >>>>> >>>>> >> >> > >> > > > > design >>>>> >>>>> >> >> > >> > > > > > > >> doc. It will be good if we can cover >>>>> them in the design >>>>> >>>>> >> >> > >> doc. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> Feel free to correct me If there is >>>>> anything I >>>>> >>>>> >> >> > >> misunderstand. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> Regards, >>>>> >>>>> >> >> > >> > > > > > > >> Dian >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> > 在 2019年12月27日,上午3:13,Peter Huang < >>>>> >>>>> >> >> > >> huangzhenqiu0...@gmail.com> >>>>> >>>>> >> >> > >> > > > 写道: >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > Hi Yang, >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > I can't agree more. The effort >>>>> definitely needs to align >>>>> >>>>> >> >> > >> with >>>>> >>>>> >> >> > >> > > > the >>>>> >>>>> >> >> > >> > > > > > > final >>>>> >>>>> >> >> > >> > > > > > > >> > goal of FLIP-73. >>>>> >>>>> >> >> > >> > > > > > > >> > I am thinking about whether we can >>>>> achieve the goal with >>>>> >>>>> >> >> > >> two >>>>> >>>>> >> >> > >> > > > > phases. >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > 1) Phase I >>>>> >>>>> >> >> > >> > > > > > > >> > As the CLiFrontend will not be >>>>> depreciated soon. We can >>>>> >>>>> >> >> > >> still >>>>> >>>>> >> >> > >> > > > use >>>>> >>>>> >> >> > >> > > > > > the >>>>> >>>>> >> >> > >> > > > > > > >> > deployMode flag there, >>>>> >>>>> >> >> > >> > > > > > > >> > pass the program info through Flink >>>>> configuration, use >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > > > > > >> > ClassPathJobGraphRetriever >>>>> >>>>> >> >> > >> > > > > > > >> > to generate the job graph in >>>>> ClusterEntrypoints of yarn >>>>> >>>>> >> >> > >> and >>>>> >>>>> >> >> > >> > > > > > > Kubernetes. >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > 2) Phase II >>>>> >>>>> >> >> > >> > > > > > > >> > In AbstractJobClusterExecutor, the >>>>> job graph is >>>>> >>>>> >> >> > >> generated in >>>>> >>>>> >> >> > >> > > > the >>>>> >>>>> >> >> > >> > > > > > > >> execute >>>>> >>>>> >> >> > >> > > > > > > >> > function. We can still >>>>> >>>>> >> >> > >> > > > > > > >> > use the deployMode in it. With >>>>> deployMode = cluster, the >>>>> >>>>> >> >> > >> > > execute >>>>> >>>>> >> >> > >> > > > > > > >> function >>>>> >>>>> >> >> > >> > > > > > > >> > only starts the cluster. >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > When >>>>> {Yarn/Kuberneates}PerJobClusterEntrypoint starts, >>>>> >>>>> >> >> > >> It will >>>>> >>>>> >> >> > >> > > > > start >>>>> >>>>> >> >> > >> > > > > > > the >>>>> >>>>> >> >> > >> > > > > > > >> > dispatch first, then we can use >>>>> >>>>> >> >> > >> > > > > > > >> > a ClusterEnvironment similar to >>>>> ContextEnvironment to >>>>> >>>>> >> >> > >> submit >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > job >>>>> >>>>> >> >> > >> > > > > > > >> with >>>>> >>>>> >> >> > >> > > > > > > >> > jobName the local >>>>> >>>>> >> >> > >> > > > > > > >> > dispatcher. For the details, we need >>>>> more investigation. >>>>> >>>>> >> >> > >> Let's >>>>> >>>>> >> >> > >> > > > > wait >>>>> >>>>> >> >> > >> > > > > > > >> > for @Aljoscha >>>>> >>>>> >> >> > >> > > > > > > >> > Krettek <aljos...@apache.org> @Till >>>>> Rohrmann < >>>>> >>>>> >> >> > >> > > > > trohrm...@apache.org >>>>> >>>>> >> >> > >> > > > > > >'s >>>>> >>>>> >> >> > >> > > > > > > >> > feedback after the holiday season. >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > Thank you in advance. Merry Chrismas >>>>> and Happy New >>>>> >>>>> >> >> > >> Year!!! >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > Best Regards >>>>> >>>>> >> >> > >> > > > > > > >> > Peter Huang >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM Yang >>>>> Wang < >>>>> >>>>> >> >> > >> > > > danrtsey...@gmail.com> >>>>> >>>>> >> >> > >> > > > > > > >> wrote: >>>>> >>>>> >> >> > >> > > > > > > >> > >>>>> >>>>> >> >> > >> > > > > > > >> >> Hi Peter, >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >> I think we need to reconsider >>>>> tison's suggestion >>>>> >>>>> >> >> > >> seriously. >>>>> >>>>> >> >> > >> > > > After >>>>> >>>>> >> >> > >> > > > > > > >> FLIP-73, >>>>> >>>>> >> >> > >> > > > > > > >> >> the deployJobCluster has >>>>> >>>>> >> >> > >> > > > > > > >> >> beenmoved into >>>>> `JobClusterExecutor#execute`. It should >>>>> >>>>> >> >> > >> not be >>>>> >>>>> >> >> > >> > > > > > > perceived >>>>> >>>>> >> >> > >> > > > > > > >> >> for `CliFrontend`. That >>>>> >>>>> >> >> > >> > > > > > > >> >> means the user program will >>>>> *ALWAYS* be executed on >>>>> >>>>> >> >> > >> client >>>>> >>>>> >> >> > >> > > > side. >>>>> >>>>> >> >> > >> > > > > > This >>>>> >>>>> >> >> > >> > > > > > > >> is >>>>> >>>>> >> >> > >> > > > > > > >> >> the by design behavior. >>>>> >>>>> >> >> > >> > > > > > > >> >> So, we could not just add >>>>> `if(client mode) .. else >>>>> >>>>> >> >> > >> if(cluster >>>>> >>>>> >> >> > >> > > > > mode) >>>>> >>>>> >> >> > >> > > > > > > >> ...` >>>>> >>>>> >> >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass >>>>> >>>>> >> >> > >> > > > > > > >> >> the executor. We need to find a >>>>> clean way to decouple >>>>> >>>>> >> >> > >> > > executing >>>>> >>>>> >> >> > >> > > > > > user >>>>> >>>>> >> >> > >> > > > > > > >> >> program and deploying per-job >>>>> >>>>> >> >> > >> > > > > > > >> >> cluster. Based on this, we could >>>>> support to execute user >>>>> >>>>> >> >> > >> > > > program >>>>> >>>>> >> >> > >> > > > > on >>>>> >>>>> >> >> > >> > > > > > > >> client >>>>> >>>>> >> >> > >> > > > > > > >> >> or master side. >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give >>>>> some good >>>>> >>>>> >> >> > >> suggestions. >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >> Best, >>>>> >>>>> >> >> > >> > > > > > > >> >> Yang >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >> Peter Huang < >>>>> huangzhenqiu0...@gmail.com> 于2019年12月25日周三 >>>>> >>>>> >> >> > >> > > > > 上午4:03写道: >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >>> Hi Jingjing, >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >>> The improvement proposed is a >>>>> deployment option for >>>>> >>>>> >> >> > >> CLI. For >>>>> >>>>> >> >> > >> > > > SQL >>>>> >>>>> >> >> > >> > > > > > > based >>>>> >>>>> >> >> > >> > > > > > > >> >>> Flink application, It is more >>>>> convenient to use the >>>>> >>>>> >> >> > >> existing >>>>> >>>>> >> >> > >> > > > > model >>>>> >>>>> >> >> > >> > > > > > > in >>>>> >>>>> >> >> > >> > > > > > > >> >>> SqlClient in which >>>>> >>>>> >> >> > >> > > > > > > >> >>> the job graph is generated within >>>>> SqlClient. After >>>>> >>>>> >> >> > >> adding >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > > > delayed >>>>> >>>>> >> >> > >> > > > > > > >> job >>>>> >>>>> >> >> > >> > > > > > > >> >>> graph generation, I think there is >>>>> no change is needed >>>>> >>>>> >> >> > >> for >>>>> >>>>> >> >> > >> > > > your >>>>> >>>>> >> >> > >> > > > > > > side. >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >>> Best Regards >>>>> >>>>> >> >> > >> > > > > > > >> >>> Peter Huang >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM >>>>> jingjing bai < >>>>> >>>>> >> >> > >> > > > > > > >> baijingjing7...@gmail.com> >>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>> hi peter: >>>>> >>>>> >> >> > >> > > > > > > >> >>>> we had extension SqlClent to >>>>> support sql job >>>>> >>>>> >> >> > >> submit in >>>>> >>>>> >> >> > >> > > web >>>>> >>>>> >> >> > >> > > > > > base >>>>> >>>>> >> >> > >> > > > > > > on >>>>> >>>>> >> >> > >> > > > > > > >> >>>> flink 1.9. we support submit to >>>>> yarn on per job >>>>> >>>>> >> >> > >> mode too. >>>>> >>>>> >> >> > >> > > > > > > >> >>>> in this case, the job graph >>>>> generated on client >>>>> >>>>> >> >> > >> side >>>>> >>>>> >> >> > >> > > . I >>>>> >>>>> >> >> > >> > > > > > think >>>>> >>>>> >> >> > >> > > > > > > >> >>> this >>>>> >>>>> >> >> > >> > > > > > > >> >>>> discuss Mainly to improve api >>>>> programme. but in my >>>>> >>>>> >> >> > >> case , >>>>> >>>>> >> >> > >> > > > > there >>>>> >>>>> >> >> > >> > > > > > is >>>>> >>>>> >> >> > >> > > > > > > >> no >>>>> >>>>> >> >> > >> > > > > > > >> >>>> jar to upload but only a sql >>>>> string . >>>>> >>>>> >> >> > >> > > > > > > >> >>>> do u had more suggestion to >>>>> improve for sql mode >>>>> >>>>> >> >> > >> or it >>>>> >>>>> >> >> > >> > > is >>>>> >>>>> >> >> > >> > > > > > only a >>>>> >>>>> >> >> > >> > > > > > > >> >>>> switch for api programme? >>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>> best >>>>> >>>>> >> >> > >> > > > > > > >> >>>> bai jj >>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>> Yang Wang <danrtsey...@gmail.com> >>>>> 于2019年12月18日周三 >>>>> >>>>> >> >> > >> 下午7:21写道: >>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> I just want to revive this >>>>> discussion. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Recently, i am thinking about >>>>> how to natively run >>>>> >>>>> >> >> > >> flink >>>>> >>>>> >> >> > >> > > > > per-job >>>>> >>>>> >> >> > >> > > > > > > >> >>> cluster on >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Kubernetes. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes >>>>> is very different >>>>> >>>>> >> >> > >> from on >>>>> >>>>> >> >> > >> > > > Yarn. >>>>> >>>>> >> >> > >> > > > > > And >>>>> >>>>> >> >> > >> > > > > > > >> we >>>>> >>>>> >> >> > >> > > > > > > >> >>> will >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> have >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> the same deployment requirements >>>>> to the client and >>>>> >>>>> >> >> > >> entry >>>>> >>>>> >> >> > >> > > > > point. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 1. Flink client not always need >>>>> a local jar to start >>>>> >>>>> >> >> > >> a >>>>> >>>>> >> >> > >> > > Flink >>>>> >>>>> >> >> > >> > > > > > > per-job >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster. We could >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> support multiple schemas. For >>>>> example, >>>>> >>>>> >> >> > >> > > > file:///path/of/my.jar >>>>> >>>>> >> >> > >> > > > > > > means >>>>> >>>>> >> >> > >> > > > > > > >> a >>>>> >>>>> >> >> > >> > > > > > > >> >>> jar >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> located >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at client side, >>>>> >>>>> >> >> > >> hdfs://myhdfs/user/myname/flink/my.jar >>>>> >>>>> >> >> > >> > > > means a >>>>> >>>>> >> >> > >> > > > > > jar >>>>> >>>>> >> >> > >> > > > > > > >> >>> located >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> remote hdfs, >>>>> local:///path/in/image/my.jar means a >>>>> >>>>> >> >> > >> jar >>>>> >>>>> >> >> > >> > > > located >>>>> >>>>> >> >> > >> > > > > > at >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> jobmanager side. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 2. Support running user program >>>>> on master side. This >>>>> >>>>> >> >> > >> also >>>>> >>>>> >> >> > >> > > > > means >>>>> >>>>> >> >> > >> > > > > > > the >>>>> >>>>> >> >> > >> > > > > > > >> >>> entry >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> point >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> will generate the job graph on >>>>> master side. We could >>>>> >>>>> >> >> > >> use >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> or start a local Flink client to >>>>> achieve this >>>>> >>>>> >> >> > >> purpose. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do >>>>> you think this is the >>>>> >>>>> >> >> > >> right >>>>> >>>>> >> >> > >> > > > > > > >> direction we >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to work? >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> tison <wander4...@gmail.com> >>>>> 于2019年12月12日周四 >>>>> >>>>> >> >> > >> 下午4:48写道: >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> A quick idea is that we >>>>> separate the deployment >>>>> >>>>> >> >> > >> from user >>>>> >>>>> >> >> > >> > > > > > program >>>>> >>>>> >> >> > >> > > > > > > >> >>> that >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> it >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> has always been done >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> outside the program. On user >>>>> program executed there >>>>> >>>>> >> >> > >> is >>>>> >>>>> >> >> > >> > > > > always a >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> ClusterClient that communicates >>>>> with >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> an existing cluster, remote or >>>>> local. It will be >>>>> >>>>> >> >> > >> another >>>>> >>>>> >> >> > >> > > > > thread >>>>> >>>>> >> >> > >> > > > > > > so >>>>> >>>>> >> >> > >> > > > > > > >> >>> just >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> for >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> your information. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> Best, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison <wander4...@gmail.com> >>>>> 于2019年12月12日周四 >>>>> >>>>> >> >> > >> 下午4:40写道: >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Hi Peter, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Another concern I realized >>>>> recently is that with >>>>> >>>>> >> >> > >> current >>>>> >>>>> >> >> > >> > > > > > > Executors >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73) >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> I'm afraid that user program >>>>> is designed to ALWAYS >>>>> >>>>> >> >> > >> run >>>>> >>>>> >> >> > >> > > on >>>>> >>>>> >> >> > >> > > > > the >>>>> >>>>> >> >> > >> > > > > > > >> >>> client >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> side. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Specifically, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> we deploy the job in executor >>>>> when env.execute >>>>> >>>>> >> >> > >> called. >>>>> >>>>> >> >> > >> > > > This >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> abstraction >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> possibly prevents >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Flink runs user program on the >>>>> cluster side. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> For your proposal, in this >>>>> case we already >>>>> >>>>> >> >> > >> compiled the >>>>> >>>>> >> >> > >> > > > > > program >>>>> >>>>> >> >> > >> > > > > > > >> and >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> run >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> on >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> the client side, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and >>>>> retrieve job graph >>>>> >>>>> >> >> > >> from >>>>> >>>>> >> >> > >> > > > program >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> metadata, it >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> doesn't make >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> many sense. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do >>>>> you think about this >>>>> >>>>> >> >> > >> > > > > constraint? >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Best, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> tison. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Peter Huang < >>>>> huangzhenqiu0...@gmail.com> >>>>> >>>>> >> >> > >> 于2019年12月10日周二 >>>>> >>>>> >> >> > >> > > > > > > >> 下午12:45写道: >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Hi Tison, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think I >>>>> made the wrong >>>>> >>>>> >> >> > >> argument >>>>> >>>>> >> >> > >> > > in >>>>> >>>>> >> >> > >> > > > > the >>>>> >>>>> >> >> > >> > > > > > > doc. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar >>>>> problem is only for >>>>> >>>>> >> >> > >> > > platform >>>>> >>>>> >> >> > >> > > > > > > users. >>>>> >>>>> >> >> > >> > > > > > > >> >>> In >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> our >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> internal deploy service, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> we further optimized the >>>>> deployment latency by >>>>> >>>>> >> >> > >> letting >>>>> >>>>> >> >> > >> > > > > users >>>>> >>>>> >> >> > >> > > > > > to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> packaging >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> flink-runtime together with >>>>> the uber jar, so that >>>>> >>>>> >> >> > >> we >>>>> >>>>> >> >> > >> > > > don't >>>>> >>>>> >> >> > >> > > > > > need >>>>> >>>>> >> >> > >> > > > > > > >> to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> consider >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> multiple flink version >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> support for now. In the >>>>> session client mode, as >>>>> >>>>> >> >> > >> Flink >>>>> >>>>> >> >> > >> > > > libs >>>>> >>>>> >> >> > >> > > > > > will >>>>> >>>>> >> >> > >> > > > > > > >> be >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> shipped >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> anyway as local resources of >>>>> yarn. Users actually >>>>> >>>>> >> >> > >> don't >>>>> >>>>> >> >> > >> > > > > need >>>>> >>>>> >> >> > >> > > > > > to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> package >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> those libs into job jar. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Best Regards >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Peter Huang >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35 >>>>> PM tison < >>>>> >>>>> >> >> > >> > > > wander4...@gmail.com >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>>> the package? Do users >>>>> >>>>> >> >> > >> need >>>>> >>>>> >> >> > >> > > to >>>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients, >>>>> flink-optimizer, >>>>> >>>>> >> >> > >> flink-table >>>>> >>>>> >> >> > >> > > > > codes? >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> The answer should be no >>>>> because they exist in >>>>> >>>>> >> >> > >> system >>>>> >>>>> >> >> > >> > > > > > > classpath. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Best, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> tison. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Yang Wang < >>>>> danrtsey...@gmail.com> 于2019年12月10日周二 >>>>> >>>>> >> >> > >> > > > > 下午12:18写道: >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting >>>>> this discussion. I >>>>> >>>>> >> >> > >> think >>>>> >>>>> >> >> > >> > > this >>>>> >>>>> >> >> > >> > > > > is >>>>> >>>>> >> >> > >> > > > > > a >>>>> >>>>> >> >> > >> > > > > > > >> >>> very >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> useful >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> feature. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am >>>>> focused on flink on >>>>> >>>>> >> >> > >> > > Kubernetes >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> integration >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> and >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> come >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> across the same >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the >>>>> job graph generated >>>>> >>>>> >> >> > >> on >>>>> >>>>> >> >> > >> > > > client >>>>> >>>>> >> >> > >> > > > > > > side. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Instead, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> the >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> user jars are built in >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When >>>>> the job manager >>>>> >>>>> >> >> > >> launched, >>>>> >>>>> >> >> > >> > > we >>>>> >>>>> >> >> > >> > > > > > just >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> generate the job graph >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> based on local user jars. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> I have some small >>>>> suggestion about this. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 1. >>>>> `ProgramJobGraphRetriever` is very similar to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> `ClasspathJobGraphRetriever`, the differences >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> are the former needs >>>>> `ProgramMetadata` and the >>>>> >>>>> >> >> > >> latter >>>>> >>>>> >> >> > >> > > > > needs >>>>> >>>>> >> >> > >> > > > > > > >> >>> some >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> arguments. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Is it possible to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> have an unified >>>>> `JobGraphRetriever` to support >>>>> >>>>> >> >> > >> both? >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not >>>>> use a local user jar to >>>>> >>>>> >> >> > >> > > start >>>>> >>>>> >> >> > >> > > > a >>>>> >>>>> >> >> > >> > > > > > > >> >>> per-job >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> cluster? >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> In your case, the user jars >>>>> has >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already and >>>>> we do need to >>>>> >>>>> >> >> > >> download >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > jars >>>>> >>>>> >> >> > >> > > > > > > to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> deployer >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> always need a local user >>>>> jar to start a flink >>>>> >>>>> >> >> > >> > > cluster. >>>>> >>>>> >> >> > >> > > > It >>>>> >>>>> >> >> > >> > > > > > is >>>>> >>>>> >> >> > >> > > > > > > >> >>> be >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> great >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> if >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> we >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> could support remote user >>>>> jars. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, we >>>>> assume users package >>>>> >>>>> >> >> > >> > > > > > > >> >>> flink-clients, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, >>>>> flink-table together within >>>>> >>>>> >> >> > >> the job >>>>> >>>>> >> >> > >> > > > jar. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Otherwise, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> job graph generation within >>>>> >>>>> >> >> > >> JobClusterEntryPoint will >>>>> >>>>> >> >> > >> > > > > fail. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>>> the package? Do users >>>>> >>>>> >> >> > >> need >>>>> >>>>> >> >> > >> > > to >>>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients, >>>>> flink-optimizer, >>>>> >>>>> >> >> > >> flink-table >>>>> >>>>> >> >> > >> > > > > > codes? >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Best, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Yang >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang < >>>>> huangzhenqiu0...@gmail.com> >>>>> >>>>> >> >> > >> > > > 于2019年12月10日周二 >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 上午2:37写道: >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Dear All, >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink >>>>> community starts to >>>>> >>>>> >> >> > >> improve the >>>>> >>>>> >> >> > >> > > > yarn >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> descriptor >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and config >>>>> files configurable >>>>> >>>>> >> >> > >> from >>>>> >>>>> >> >> > >> > > > CLI. >>>>> >>>>> >> >> > >> > > > > It >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> improves >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> flexibility of Flink >>>>> deployment Yarn Per Job >>>>> >>>>> >> >> > >> Mode. >>>>> >>>>> >> >> > >> > > > For >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> platform >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> users >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> who >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds of >>>>> streaming pipelines >>>>> >>>>> >> >> > >> for >>>>> >>>>> >> >> > >> > > the >>>>> >>>>> >> >> > >> > > > > > whole >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> org >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> or >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job >>>>> graph generation in >>>>> >>>>> >> >> > >> > > > > client-side >>>>> >>>>> >> >> > >> > > > > > is >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> another >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want to >>>>> propose a >>>>> >>>>> >> >> > >> configurable >>>>> >>>>> >> >> > >> > > > > feature >>>>> >>>>> >> >> > >> > > > > > > >> >>> for >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The >>>>> feature can allow >>>>> >>>>> >> >> > >> users to >>>>> >>>>> >> >> > >> > > > > choose >>>>> >>>>> >> >> > >> > > > > > > >> >>> the >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> job >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> graph >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink >>>>> ClusterEntryPoint so that >>>>> >>>>> >> >> > >> the >>>>> >>>>> >> >> > >> > > job >>>>> >>>>> >> >> > >> > > > > jar >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> doesn't >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> need >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> to >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job >>>>> graph generation. The >>>>> >>>>> >> >> > >> > > proposal >>>>> >>>>> >> >> > >> > > > is >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> organized >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> as a >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FLIP >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> . >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and >>>>> suggestions are welcomed. >>>>> >>>>> >> >> > >> Thank >>>>> >>>>> >> >> > >> > > you >>>>> >>>>> >> >> > >> > > > in >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> advance. >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>> >> >> > >> > > > > > > >>>>> >>>>> >> >> > >> > > > > > >>>>> >>>>> >> >> > >> > > > > >>>>> >>>>> >> >> > >> > > > >>>>> >>>>> >> >> > >> > > >>>>> >>>>> >> >> > >> >>>>> >>>>> >> >> > > >>>>> >>>>> >> >> >>>>> >>>>