Thanks Yang, That would be very helpful!
Jiangjie (Becket) Qin On Mon, Mar 9, 2020 at 3:31 PM Yang Wang <danrtsey...@gmail.com> wrote: > Hi Becket, > > Thanks for your suggestion. We will update the FLIP to add/enrich the > following parts. > * User cli option change, use "-R/--remote" to apply the cluster deploy > mode > * Configuration change, how to specify remote user jars and dependencies > * The whole story about how "application mode" works, upload -> fetch -> > submit job > * The cluster lifecycle, when and how the Flink cluster is destroyed > > > Best, > Yang > > Becket Qin <becket....@gmail.com> 于2020年3月9日周一 下午12:34写道: > >> Thanks for the reply, tison and Yang, >> >> Regarding the public interface, is "-R/--remote" option the only change? >> Will the users also need to provide a remote location to upload and store >> the jars, and a list of jars as dependencies to be uploaded? >> >> It would be important that the public interface section in the FLIP >> includes all the user sensible changes including the CLI / configuration / >> metrics, etc. Can we update the FLIP to include the conclusion we have here >> in the ML? >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Mon, Mar 9, 2020 at 11:59 AM Yang Wang <danrtsey...@gmail.com> wrote: >> >>> Hi Becket, >>> >>> Thanks for jumping out and sharing your concerns. I second tison's >>> answer and just >>> make some additions. >>> >>> >>> > job submission interface >>> >>> This FLIP will introduce an interface for running user `main()` on >>> cluster, named as >>> “ProgramDeployer”. However, it is not a public interface. It will be >>> used in `CliFrontend` >>> when the remote deploy option(-R/--remote-deploy) is specified. So the >>> only changes >>> on user side is about the cli option. >>> >>> >>> > How to fetch the jars? >>> >>> The “local path” and “dfs path“ could be supported to fetch the user >>> jars and dependencies. >>> Just like tison has said, we could ship the user jar and dependencies >>> from client side to >>> HDFS and use the entrypoint to fetch. >>> >>> Also we have some other practical ways to use the new “application mode“. >>> 1. Upload the user jars and dependencies to the DFS(e.g. HDFS, S3, >>> Aliyun OSS) manually >>> or some external deployer system. For K8s, the user jars and >>> dependencies could also be >>> built in the docker image. >>> 2. Specify the remote/local user jar and dependencies in `flink run`. >>> Usually this could also >>> be done by the external deployer system. >>> 3. When the `ClusterEntrypoint` is launched, it will fetch the jars and >>> files automatically. We >>> do not need any specific fetcher implementation. Since we could leverage >>> flink `FileSystem` >>> to do this. >>> >>> >>> >>> >>> >>> Best, >>> Yang >>> >>> tison <wander4...@gmail.com> 于2020年3月9日周一 上午11:34写道: >>> >>>> Hi Becket, >>>> >>>> Thanks for your attention on FLIP-85! I answered your question inline. >>>> >>>> 1. What exactly the job submission interface will look like after this >>>> FLIP? The FLIP template has a Public Interface section but was removed from >>>> this FLIP. >>>> >>>> As Yang mentioned in this thread above: >>>> >>>> From user perspective, only a `-R/-- remote-deploy` cli option is >>>> visible. They are not aware of the application mode. >>>> >>>> 2. How will the new ClusterEntrypoint fetch the jars from external >>>> storage? What external storage will be supported out of the box? Will this >>>> "jar fetcher" be pluggable? If so, how does the API look like and how will >>>> users specify the custom "jar fetcher"? >>>> >>>> It depends actually. Here are several points: >>>> >>>> i. Currently, shipping user files is handled by Flink, dependencies >>>> fetching can be handled by Flink. >>>> ii. Current, we only support local file system shipfiles. When in >>>> Application Mode, to support meaningful jar fetch we should support user to >>>> configure richer shipfiles schema at first. >>>> iii. Dependencies fetching varies from deployments. That is, on YARN, >>>> its convention is through HDFS; on Kubernetes, its convention is configured >>>> resource server and fetched by initContainer. >>>> >>>> Thus, in the First phase of Application Mode dependencies fetching is >>>> totally handled within Flink. >>>> >>>> 3. It sounds that in this FLIP, the "session cluster" running the >>>> application has the same lifecycle as the user application. How will the >>>> session cluster be teared down after the application finishes? Will the >>>> ClusterEntrypoint do that? Will there be an option of not tearing the >>>> cluster down? >>>> >>>> The precondition we tear down the cluster is *both* >>>> >>>> i. user main reached to its end >>>> ii. all jobs submitted(current, at most one) reached global terminate >>>> state >>>> >>>> For the "how", it is an implementation topic, but conceptually it is >>>> ClusterEntrypoint's responsibility. >>>> >>>> >Will there be an option of not tearing the cluster down? >>>> >>>> I think the answer is "No" because the cluster is designed to be >>>> bounded with an Application. User logic that communicates with the job is >>>> always in its `main`, and for history information we have history server. >>>> >>>> Best, >>>> tison. >>>> >>>> >>>> Becket Qin <becket....@gmail.com> 于2020年3月9日周一 上午8:12写道: >>>> >>>>> Hi Peter and Kostas, >>>>> >>>>> Thanks for creating this FLIP. Moving the JobGraph compilation to the >>>>> cluster makes a lot of sense to me. FLIP-40 had the exactly same idea, but >>>>> is currently dormant and can probably be superseded by this FLIP. After >>>>> reading the FLIP, I still have a few questions. >>>>> >>>>> 1. What exactly the job submission interface will look like after this >>>>> FLIP? The FLIP template has a Public Interface section but was removed >>>>> from >>>>> this FLIP. >>>>> 2. How will the new ClusterEntrypoint fetch the jars from external >>>>> storage? What external storage will be supported out of the box? Will this >>>>> "jar fetcher" be pluggable? If so, how does the API look like and how will >>>>> users specify the custom "jar fetcher"? >>>>> 3. It sounds that in this FLIP, the "session cluster" running the >>>>> application has the same lifecycle as the user application. How will the >>>>> session cluster be teared down after the application finishes? Will the >>>>> ClusterEntrypoint do that? Will there be an option of not tearing the >>>>> cluster down? >>>>> >>>>> Maybe they have been discussed in the ML earlier, but I think they >>>>> should be part of the FLIP also. >>>>> >>>>> Thanks, >>>>> >>>>> Jiangjie (Becket) Qin >>>>> >>>>> On Thu, Mar 5, 2020 at 10:09 PM Kostas Kloudas <kklou...@gmail.com> >>>>> wrote: >>>>> >>>>>> Also from my side +1 to start voting. >>>>>> >>>>>> Cheers, >>>>>> Kostas >>>>>> >>>>>> On Thu, Mar 5, 2020 at 7:45 AM tison <wander4...@gmail.com> wrote: >>>>>> > >>>>>> > +1 to star voting. >>>>>> > >>>>>> > Best, >>>>>> > tison. >>>>>> > >>>>>> > >>>>>> > Yang Wang <danrtsey...@gmail.com> 于2020年3月5日周四 下午2:29写道: >>>>>> >> >>>>>> >> Hi Peter, >>>>>> >> Really thanks for your response. >>>>>> >> >>>>>> >> Hi all @Kostas Kloudas @Zili Chen @Peter Huang @Rong Rong >>>>>> >> It seems that we have reached an agreement. The “application mode” >>>>>> is regarded as the enhanced “per-job”. It is >>>>>> >> orthogonal with “cluster deploy”. Currently, we bind the “per-job” >>>>>> to `run-user-main-on-client` and “application mode” >>>>>> >> to `run-user-main-on-cluster`. >>>>>> >> >>>>>> >> Do you have other concerns to moving FLIP-85 to voting? >>>>>> >> >>>>>> >> >>>>>> >> Best, >>>>>> >> Yang >>>>>> >> >>>>>> >> Peter Huang <huangzhenqiu0...@gmail.com> 于2020年3月5日周四 下午12:48写道: >>>>>> >>> >>>>>> >>> Hi Yang and Kostas, >>>>>> >>> >>>>>> >>> Thanks for the clarification. It makes more sense to me if the >>>>>> long term goal is to replace per job mode to application mode >>>>>> >>> in the future (at the time that multiple execute can be >>>>>> supported). Before that, It will be better to keep the concept of >>>>>> >>> application mode internally. As Yang suggested, User only need >>>>>> to use a `-R/-- remote-deploy` cli option to launch >>>>>> >>> a per job cluster with the main function executed in cluster >>>>>> entry-point. +1 for the execution plan. >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> Best Regards >>>>>> >>> Peter Huang >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> On Tue, Mar 3, 2020 at 7:11 AM Yang Wang <danrtsey...@gmail.com> >>>>>> wrote: >>>>>> >>>> >>>>>> >>>> Hi Peter, >>>>>> >>>> >>>>>> >>>> Having the application mode does not mean we will drop the >>>>>> cluster-deploy >>>>>> >>>> option. I just want to share some thoughts about “Application >>>>>> Mode”. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> 1. The application mode could cover the per-job sematic. Its >>>>>> lifecyle is bound >>>>>> >>>> to the user `main()`. And all the jobs in the user main will be >>>>>> executed in a same >>>>>> >>>> Flink cluster. In first phase of FLIP-85 implementation, running >>>>>> user main on the >>>>>> >>>> cluster side could be supported in application mode. >>>>>> >>>> >>>>>> >>>> 2. Maybe in the future, we also need to support multiple >>>>>> `execute()` on client side >>>>>> >>>> in a same Flink cluster. Then the per-job mode will evolve to >>>>>> application mode. >>>>>> >>>> >>>>>> >>>> 3. From user perspective, only a `-R/-- remote-deploy` cli >>>>>> option is visible. They >>>>>> >>>> are not aware of the application mode. >>>>>> >>>> >>>>>> >>>> 4. In the first phase, the application mode is working as >>>>>> “per-job”(only one job in >>>>>> >>>> the user main). We just leave more potential for the future. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> I am not against with calling it “cluster deploy mode” if you >>>>>> all think it is clearer for users. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> Best, >>>>>> >>>> Yang >>>>>> >>>> >>>>>> >>>> Kostas Kloudas <kklou...@gmail.com> 于2020年3月3日周二 下午6:49写道: >>>>>> >>>>> >>>>>> >>>>> Hi Peter, >>>>>> >>>>> >>>>>> >>>>> I understand your point. This is why I was also a bit torn >>>>>> about the >>>>>> >>>>> name and my proposal was a bit aligned with yours (something >>>>>> along the >>>>>> >>>>> lines of "cluster deploy" mode). >>>>>> >>>>> >>>>>> >>>>> But many of the other participants in the discussion suggested >>>>>> the >>>>>> >>>>> "Application Mode". I think that the reasoning is that now the >>>>>> user's >>>>>> >>>>> Application is more self-contained. >>>>>> >>>>> It will be submitted to the cluster and the user can just >>>>>> disconnect. >>>>>> >>>>> In addition, as discussed briefly in the doc, in the future >>>>>> there may >>>>>> >>>>> be better support for multi-execute applications which will >>>>>> bring us >>>>>> >>>>> one step closer to the true "Application Mode". But this is how >>>>>> I >>>>>> >>>>> interpreted their arguments, of course they can also express >>>>>> their >>>>>> >>>>> thoughts on the topic :) >>>>>> >>>>> >>>>>> >>>>> Cheers, >>>>>> >>>>> Kostas >>>>>> >>>>> >>>>>> >>>>> On Mon, Mar 2, 2020 at 6:15 PM Peter Huang < >>>>>> huangzhenqiu0...@gmail.com> wrote: >>>>>> >>>>> > >>>>>> >>>>> > Hi Kostas, >>>>>> >>>>> > >>>>>> >>>>> > Thanks for updating the wiki. We have aligned with the >>>>>> implementations in the doc. But I feel it is still a little bit confusing >>>>>> of the naming from a user's perspective. It is well known that Flink >>>>>> support per job cluster and session cluster. The concept is in the layer >>>>>> of >>>>>> how a job is managed within Flink. The method introduced util now is a >>>>>> kind >>>>>> of mixing job and session cluster to promising the implementation >>>>>> complexity. We probably don't need to label it as Application Model as >>>>>> the >>>>>> same layer of per job cluster and session cluster. Conceptually, I think >>>>>> it >>>>>> is still a cluster mode implementation for per job cluster. >>>>>> >>>>> > >>>>>> >>>>> > To minimize the confusion of users, I think it would be >>>>>> better just an option of per job cluster for each type of cluster >>>>>> manager. >>>>>> How do you think? >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > Best Regards >>>>>> >>>>> > Peter Huang >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas < >>>>>> kklou...@gmail.com> wrote: >>>>>> >>>>> >> >>>>>> >>>>> >> Hi Yang, >>>>>> >>>>> >> >>>>>> >>>>> >> The difference between per-job and application mode is that, >>>>>> as you >>>>>> >>>>> >> described, in the per-job mode the main is executed on the >>>>>> client >>>>>> >>>>> >> while in the application mode, the main is executed on the >>>>>> cluster. >>>>>> >>>>> >> I do not think we have to offer "application mode" with >>>>>> running the >>>>>> >>>>> >> main on the client side as this is exactly what the per-job >>>>>> mode does >>>>>> >>>>> >> currently and, as you described also, it would be redundant. >>>>>> >>>>> >> >>>>>> >>>>> >> Sorry if this was not clear in the document. >>>>>> >>>>> >> >>>>>> >>>>> >> Cheers, >>>>>> >>>>> >> Kostas >>>>>> >>>>> >> >>>>>> >>>>> >> On Mon, Mar 2, 2020 at 3:17 PM Yang Wang < >>>>>> danrtsey...@gmail.com> wrote: >>>>>> >>>>> >> > >>>>>> >>>>> >> > Hi Kostas, >>>>>> >>>>> >> > >>>>>> >>>>> >> > Thanks a lot for your conclusion and updating the FLIP-85 >>>>>> WIKI. Currently, i have no more >>>>>> >>>>> >> > questions about motivation, approach, fault tolerance and >>>>>> the first phase implementation. >>>>>> >>>>> >> > >>>>>> >>>>> >> > I think the new title "Flink Application Mode" makes a lot >>>>>> senses to me. Especially for the >>>>>> >>>>> >> > containerized environment, the cluster deploy option will >>>>>> be very useful. >>>>>> >>>>> >> > >>>>>> >>>>> >> > Just one concern, how do we introduce this new application >>>>>> mode to our users? >>>>>> >>>>> >> > Each user program(i.e. `main()`) is an application. >>>>>> Currently, we intend to only support one >>>>>> >>>>> >> > `execute()`. So what's the difference between per-job and >>>>>> application mode? >>>>>> >>>>> >> > >>>>>> >>>>> >> > For per-job, user `main()` is always executed on client >>>>>> side. And For application mode, user >>>>>> >>>>> >> > `main()` could be executed on client or master >>>>>> side(configured via cli option). >>>>>> >>>>> >> > Right? We need to have a clear concept. Otherwise, the >>>>>> users will be more and more confusing. >>>>>> >>>>> >> > >>>>>> >>>>> >> > >>>>>> >>>>> >> > Best, >>>>>> >>>>> >> > Yang >>>>>> >>>>> >> > >>>>>> >>>>> >> > Kostas Kloudas <kklou...@gmail.com> 于2020年3月2日周一 下午5:58写道: >>>>>> >>>>> >> >> >>>>>> >>>>> >> >> Hi all, >>>>>> >>>>> >> >> >>>>>> >>>>> >> >> I update >>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode >>>>>> >>>>> >> >> based on the discussion we had here: >>>>>> >>>>> >> >> >>>>>> >>>>> >> >> >>>>>> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit# >>>>>> >>>>> >> >> >>>>>> >>>>> >> >> Please let me know what you think and please keep the >>>>>> discussion in the ML :) >>>>>> >>>>> >> >> >>>>>> >>>>> >> >> Thanks for starting the discussion and I hope that soon >>>>>> we will be >>>>>> >>>>> >> >> able to vote on the FLIP. >>>>>> >>>>> >> >> >>>>>> >>>>> >> >> Cheers, >>>>>> >>>>> >> >> Kostas >>>>>> >>>>> >> >> >>>>>> >>>>> >> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang < >>>>>> danrtsey...@gmail.com> wrote: >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > Hi all, >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > Thanks a lot for the feedback from @Kostas Kloudas. >>>>>> Your all concerns are >>>>>> >>>>> >> >> > on point. The FLIP-85 is mainly >>>>>> >>>>> >> >> > focused on supporting cluster mode for per-job. Since >>>>>> it is more urgent and >>>>>> >>>>> >> >> > have much more use >>>>>> >>>>> >> >> > cases both in Yarn and Kubernetes deployment. For >>>>>> session cluster, we could >>>>>> >>>>> >> >> > have more discussion >>>>>> >>>>> >> >> > in a new thread later. >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > #1, How to download the user jars and dependencies for >>>>>> per-job in cluster >>>>>> >>>>> >> >> > mode? >>>>>> >>>>> >> >> > For Yarn, we could register the user jars and >>>>>> dependencies as >>>>>> >>>>> >> >> > LocalResource. They will be distributed >>>>>> >>>>> >> >> > by Yarn. And once the JobManager and TaskManager >>>>>> launched, the jars are >>>>>> >>>>> >> >> > already exists. >>>>>> >>>>> >> >> > For Standalone per-job and K8s, we expect that the user >>>>>> jars >>>>>> >>>>> >> >> > and dependencies are built into the image. >>>>>> >>>>> >> >> > Or the InitContainer could be used for downloading. It >>>>>> is natively >>>>>> >>>>> >> >> > distributed and we will not have bottleneck. >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > #2, Job graph recovery >>>>>> >>>>> >> >> > We could have an optimization to store job graph on the >>>>>> DFS. However, i >>>>>> >>>>> >> >> > suggest building a new jobgraph >>>>>> >>>>> >> >> > from the configuration is the default option. Since we >>>>>> will not always have >>>>>> >>>>> >> >> > a DFS store when deploying a >>>>>> >>>>> >> >> > Flink per-job cluster. Of course, we assume that using >>>>>> the same >>>>>> >>>>> >> >> > configuration(e.g. job_id, user_jar, main_class, >>>>>> >>>>> >> >> > main_args, parallelism, savepoint_settings, etc.) will >>>>>> get a same job >>>>>> >>>>> >> >> > graph. I think the standalone per-job >>>>>> >>>>> >> >> > already has the similar behavior. >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > #3, What happens with jobs that have multiple execute >>>>>> calls? >>>>>> >>>>> >> >> > Currently, it is really a problem. Even we use a local >>>>>> client on Flink >>>>>> >>>>> >> >> > master side, it will have different behavior with >>>>>> >>>>> >> >> > client mode. For client mode, if we execute multiple >>>>>> times, then we will >>>>>> >>>>> >> >> > deploy multiple Flink clusters for each execute. >>>>>> >>>>> >> >> > I am not pretty sure whether it is reasonable. However, >>>>>> i still think using >>>>>> >>>>> >> >> > the local client is a good choice. We could >>>>>> >>>>> >> >> > continue the discussion in a new thread. @Zili Chen < >>>>>> wander4...@gmail.com> Do >>>>>> >>>>> >> >> > you want to drive this? >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > Best, >>>>>> >>>>> >> >> > Yang >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月16日周四 >>>>>> 上午1:55写道: >>>>>> >>>>> >> >> > >>>>>> >>>>> >> >> > > Hi Kostas, >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > Thanks for this feedback. I can't agree more about >>>>>> the opinion. The >>>>>> >>>>> >> >> > > cluster mode should be added >>>>>> >>>>> >> >> > > first in per job cluster. >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > 1) For job cluster implementation >>>>>> >>>>> >> >> > > 1. Job graph recovery from configuration or store as >>>>>> static job graph as >>>>>> >>>>> >> >> > > session cluster. I think the static one will be >>>>>> better for less recovery >>>>>> >>>>> >> >> > > time. >>>>>> >>>>> >> >> > > Let me update the doc for details. >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > 2. For job execute multiple times, I think @Zili Chen >>>>>> >>>>> >> >> > > <wander4...@gmail.com> has proposed the local client >>>>>> solution that can >>>>>> >>>>> >> >> > > the run program actually in the cluster entry point. >>>>>> We can put the >>>>>> >>>>> >> >> > > implementation in the second stage, >>>>>> >>>>> >> >> > > or even a new FLIP for further discussion. >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > 2) For session cluster implementation >>>>>> >>>>> >> >> > > We can disable the cluster mode for the session >>>>>> cluster in the first >>>>>> >>>>> >> >> > > stage. I agree the jar downloading will be a painful >>>>>> thing. >>>>>> >>>>> >> >> > > We can consider about PoC and performance evaluation >>>>>> first. If the end to >>>>>> >>>>> >> >> > > end experience is good enough, then we can consider >>>>>> >>>>> >> >> > > proceeding with the solution. >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > Looking forward to more opinions from @Yang Wang < >>>>>> danrtsey...@gmail.com> @Zili >>>>>> >>>>> >> >> > > Chen <wander4...@gmail.com> @Dian Fu < >>>>>> dian0511...@gmail.com>. >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > Best Regards >>>>>> >>>>> >> >> > > Peter Huang >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas < >>>>>> kklou...@gmail.com> wrote: >>>>>> >>>>> >> >> > > >>>>>> >>>>> >> >> > >> Hi all, >>>>>> >>>>> >> >> > >> >>>>>> >>>>> >> >> > >> I am writing here as the discussion on the Google >>>>>> Doc seems to be a >>>>>> >>>>> >> >> > >> bit difficult to follow. >>>>>> >>>>> >> >> > >> >>>>>> >>>>> >> >> > >> I think that in order to be able to make progress, >>>>>> it would be helpful >>>>>> >>>>> >> >> > >> to focus on per-job mode for now. >>>>>> >>>>> >> >> > >> The reason is that: >>>>>> >>>>> >> >> > >> 1) making the (unique) JobSubmitHandler responsible >>>>>> for creating the >>>>>> >>>>> >> >> > >> jobgraphs, >>>>>> >>>>> >> >> > >> which includes downloading dependencies, is not an >>>>>> optimal solution >>>>>> >>>>> >> >> > >> 2) even if we put the responsibility on the >>>>>> JobMaster, currently each >>>>>> >>>>> >> >> > >> job has its own >>>>>> >>>>> >> >> > >> JobMaster but they all run on the same process, so >>>>>> we have again a >>>>>> >>>>> >> >> > >> single entity. >>>>>> >>>>> >> >> > >> >>>>>> >>>>> >> >> > >> Of course after this is done, and if we feel >>>>>> comfortable with the >>>>>> >>>>> >> >> > >> solution, then we can go to the session mode. >>>>>> >>>>> >> >> > >> >>>>>> >>>>> >> >> > >> A second comment has to do with fault-tolerance in >>>>>> the per-job, >>>>>> >>>>> >> >> > >> cluster-deploy mode. >>>>>> >>>>> >> >> > >> In the document, it is suggested that upon recovery, >>>>>> the JobMaster of >>>>>> >>>>> >> >> > >> each job re-creates the JobGraph. >>>>>> >>>>> >> >> > >> I am just wondering if it is better to create and >>>>>> store the jobGraph >>>>>> >>>>> >> >> > >> upon submission and only fetch it >>>>>> >>>>> >> >> > >> upon recovery so that we have a static jobGraph. >>>>>> >>>>> >> >> > >> >>>>>> >>>>> >> >> > >> Finally, I have a question which is what happens >>>>>> with jobs that have >>>>>> >>>>> >> >> > >> multiple execute calls? >>>>>> >>>>> >> >> > >> The semantics seem to change compared to the current >>>>>> behaviour, right? >>>>>> >>>>> >> >> > >> >>>>>> >>>>> >> >> > >> Cheers, >>>>>> >>>>> >> >> > >> Kostas >>>>>> >>>>> >> >> > >> >>>>>> >>>>> >> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison < >>>>>> wander4...@gmail.com> wrote: >>>>>> >>>>> >> >> > >> > >>>>>> >>>>> >> >> > >> > not always, Yang Wang is also not yet a committer >>>>>> but he can join the >>>>>> >>>>> >> >> > >> > channel. I cannot find the id by clicking “Add new >>>>>> member in channel” so >>>>>> >>>>> >> >> > >> > come to you and ask for try out the link. Possibly >>>>>> I will find other >>>>>> >>>>> >> >> > >> ways >>>>>> >>>>> >> >> > >> > but the original purpose is that the slack channel >>>>>> is a public area we >>>>>> >>>>> >> >> > >> > discuss about developing... >>>>>> >>>>> >> >> > >> > Best, >>>>>> >>>>> >> >> > >> > tison. >>>>>> >>>>> >> >> > >> > >>>>>> >>>>> >> >> > >> > >>>>>> >>>>> >> >> > >> > Peter Huang <huangzhenqiu0...@gmail.com> >>>>>> 于2020年1月9日周四 上午2:44写道: >>>>>> >>>>> >> >> > >> > >>>>>> >>>>> >> >> > >> > > Hi Tison, >>>>>> >>>>> >> >> > >> > > >>>>>> >>>>> >> >> > >> > > I am not the committer of Flink yet. I think I >>>>>> can't join it also. >>>>>> >>>>> >> >> > >> > > >>>>>> >>>>> >> >> > >> > > >>>>>> >>>>> >> >> > >> > > Best Regards >>>>>> >>>>> >> >> > >> > > Peter Huang >>>>>> >>>>> >> >> > >> > > >>>>>> >>>>> >> >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison < >>>>>> wander4...@gmail.com> wrote: >>>>>> >>>>> >> >> > >> > > >>>>>> >>>>> >> >> > >> > > > Hi Peter, >>>>>> >>>>> >> >> > >> > > > >>>>>> >>>>> >> >> > >> > > > Could you try out this link? >>>>>> >>>>> >> >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH >>>>>> >>>>> >> >> > >> > > > >>>>>> >>>>> >> >> > >> > > > Best, >>>>>> >>>>> >> >> > >> > > > tison. >>>>>> >>>>> >> >> > >> > > > >>>>>> >>>>> >> >> > >> > > > >>>>>> >>>>> >> >> > >> > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>>>> 于2020年1月9日周四 上午1:22写道: >>>>>> >>>>> >> >> > >> > > > >>>>>> >>>>> >> >> > >> > > > > Hi Tison, >>>>>> >>>>> >> >> > >> > > > > >>>>>> >>>>> >> >> > >> > > > > I can't join the group with shared link. >>>>>> Would you please add me >>>>>> >>>>> >> >> > >> into >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > group? My slack account is huangzhenqiu0825. >>>>>> >>>>> >> >> > >> > > > > Thank you in advance. >>>>>> >>>>> >> >> > >> > > > > >>>>>> >>>>> >> >> > >> > > > > >>>>>> >>>>> >> >> > >> > > > > Best Regards >>>>>> >>>>> >> >> > >> > > > > Peter Huang >>>>>> >>>>> >> >> > >> > > > > >>>>>> >>>>> >> >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison < >>>>>> wander4...@gmail.com> >>>>>> >>>>> >> >> > >> wrote: >>>>>> >>>>> >> >> > >> > > > > >>>>>> >>>>> >> >> > >> > > > > > Hi Peter, >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > As described above, this effort should get >>>>>> attention from people >>>>>> >>>>> >> >> > >> > > > > developing >>>>>> >>>>> >> >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I >>>>>> recommend you to join >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > > public >>>>>> >>>>> >> >> > >> > > > > > slack channel[1] for Flink Client API >>>>>> Enhancement and you can >>>>>> >>>>> >> >> > >> try to >>>>>> >>>>> >> >> > >> > > > > share >>>>>> >>>>> >> >> > >> > > > > > you detailed thoughts there. It possibly >>>>>> gets more concrete >>>>>> >>>>> >> >> > >> > > attentions. >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > Best, >>>>>> >>>>> >> >> > >> > > > > > tison. >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > [1] >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > >>>>>> >>>>> >> >> > >> > > > >>>>>> >>>>> >> >> > >> > > >>>>>> >>>>> >> >> > >> >>>>>> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>>>> 于2020年1月7日周二 上午5:09写道: >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > > Dear All, >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > Happy new year! According to existing >>>>>> feedback from the >>>>>> >>>>> >> >> > >> community, >>>>>> >>>>> >> >> > >> > > we >>>>>> >>>>> >> >> > >> > > > > > > revised the doc with the consideration >>>>>> of session cluster >>>>>> >>>>> >> >> > >> support, >>>>>> >>>>> >> >> > >> > > > and >>>>>> >>>>> >> >> > >> > > > > > > concrete interface changes needed and >>>>>> execution plan. Please >>>>>> >>>>> >> >> > >> take >>>>>> >>>>> >> >> > >> > > one >>>>>> >>>>> >> >> > >> > > > > > more >>>>>> >>>>> >> >> > >> > > > > > > round of review at your most convenient >>>>>> time. >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > >>>>>> >>>>> >> >> > >> > > > >>>>>> >>>>> >> >> > >> > > >>>>>> >>>>> >> >> > >> >>>>>> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit# >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > Best Regards >>>>>> >>>>> >> >> > >> > > > > > > Peter Huang >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter >>>>>> Huang < >>>>>> >>>>> >> >> > >> > > > > huangzhenqiu0...@gmail.com> >>>>>> >>>>> >> >> > >> > > > > > > wrote: >>>>>> >>>>> >> >> > >> > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > Hi Dian, >>>>>> >>>>> >> >> > >> > > > > > > > Thanks for giving us valuable >>>>>> feedbacks. >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > 1) It's better to have a whole design >>>>>> for this feature >>>>>> >>>>> >> >> > >> > > > > > > > For the suggestion of enabling the >>>>>> cluster mode also session >>>>>> >>>>> >> >> > >> > > > > cluster, I >>>>>> >>>>> >> >> > >> > > > > > > > think Flink already supported it. >>>>>> WebSubmissionExtension >>>>>> >>>>> >> >> > >> already >>>>>> >>>>> >> >> > >> > > > > allows >>>>>> >>>>> >> >> > >> > > > > > > > users to start a job with the >>>>>> specified jar by using web UI. >>>>>> >>>>> >> >> > >> > > > > > > > But we need to enable the feature from >>>>>> CLI for both local >>>>>> >>>>> >> >> > >> jar, >>>>>> >>>>> >> >> > >> > > > remote >>>>>> >>>>> >> >> > >> > > > > > > jar. >>>>>> >>>>> >> >> > >> > > > > > > > I will align with Yang Wang first >>>>>> about the details and >>>>>> >>>>> >> >> > >> update >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > > design >>>>>> >>>>> >> >> > >> > > > > > > > doc. >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > 2) It's better to consider the >>>>>> convenience for users, such >>>>>> >>>>> >> >> > >> as >>>>>> >>>>> >> >> > >> > > > > debugging >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > I am wondering whether we can store >>>>>> the exception in >>>>>> >>>>> >> >> > >> jobgragh >>>>>> >>>>> >> >> > >> > > > > > > > generation in application master. As >>>>>> no streaming graph can >>>>>> >>>>> >> >> > >> be >>>>>> >>>>> >> >> > >> > > > > > scheduled >>>>>> >>>>> >> >> > >> > > > > > > in >>>>>> >>>>> >> >> > >> > > > > > > > this case, there will be no more TM >>>>>> will be requested from >>>>>> >>>>> >> >> > >> > > FlinkRM. >>>>>> >>>>> >> >> > >> > > > > > > > If the AM is still running, users can >>>>>> still query it from >>>>>> >>>>> >> >> > >> CLI. As >>>>>> >>>>> >> >> > >> > > > it >>>>>> >>>>> >> >> > >> > > > > > > > requires more change, we can get some >>>>>> feedback from < >>>>>> >>>>> >> >> > >> > > > > > aljos...@apache.org >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > and @zjf...@gmail.com < >>>>>> zjf...@gmail.com>. >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > 3) It's better to consider the impact >>>>>> to the stability of >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > > cluster >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > I agree with Yang Wang's opinion. >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > Best Regards >>>>>> >>>>> >> >> > >> > > > > > > > Peter Huang >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian >>>>>> Fu < >>>>>> >>>>> >> >> > >> dian0511...@gmail.com> >>>>>> >>>>> >> >> > >> > > > > wrote: >>>>>> >>>>> >> >> > >> > > > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >> Hi all, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> Sorry to jump into this discussion. >>>>>> Thanks everyone for the >>>>>> >>>>> >> >> > >> > > > > > discussion. >>>>>> >>>>> >> >> > >> > > > > > > >> I'm very interested in this topic >>>>>> although I'm not an >>>>>> >>>>> >> >> > >> expert in >>>>>> >>>>> >> >> > >> > > > this >>>>>> >>>>> >> >> > >> > > > > > > part. >>>>>> >>>>> >> >> > >> > > > > > > >> So I'm glad to share my thoughts as >>>>>> following: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> 1) It's better to have a whole design >>>>>> for this feature >>>>>> >>>>> >> >> > >> > > > > > > >> As we know, there are two deployment >>>>>> modes: per-job mode >>>>>> >>>>> >> >> > >> and >>>>>> >>>>> >> >> > >> > > > session >>>>>> >>>>> >> >> > >> > > > > > > >> mode. I'm wondering which mode really >>>>>> needs this feature. >>>>>> >>>>> >> >> > >> As the >>>>>> >>>>> >> >> > >> > > > > > design >>>>>> >>>>> >> >> > >> > > > > > > doc >>>>>> >>>>> >> >> > >> > > > > > > >> mentioned, per-job mode is more used >>>>>> for streaming jobs and >>>>>> >>>>> >> >> > >> > > > session >>>>>> >>>>> >> >> > >> > > > > > > mode is >>>>>> >>>>> >> >> > >> > > > > > > >> usually used for batch jobs(Of >>>>>> course, the job types and >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > > > > deployment >>>>>> >>>>> >> >> > >> > > > > > > >> modes are orthogonal). Usually >>>>>> streaming job is only >>>>>> >>>>> >> >> > >> needed to >>>>>> >>>>> >> >> > >> > > be >>>>>> >>>>> >> >> > >> > > > > > > submitted >>>>>> >>>>> >> >> > >> > > > > > > >> once and it will run for days or >>>>>> weeks, while batch jobs >>>>>> >>>>> >> >> > >> will be >>>>>> >>>>> >> >> > >> > > > > > > submitted >>>>>> >>>>> >> >> > >> > > > > > > >> more frequently compared with >>>>>> streaming jobs. This means >>>>>> >>>>> >> >> > >> that >>>>>> >>>>> >> >> > >> > > > maybe >>>>>> >>>>> >> >> > >> > > > > > > session >>>>>> >>>>> >> >> > >> > > > > > > >> mode also needs this feature. >>>>>> However, if we support this >>>>>> >>>>> >> >> > >> > > feature >>>>>> >>>>> >> >> > >> > > > in >>>>>> >>>>> >> >> > >> > > > > > > >> session mode, the application master >>>>>> will become the new >>>>>> >>>>> >> >> > >> > > > centralized >>>>>> >>>>> >> >> > >> > > > > > > >> service(which should be solved). So >>>>>> in this case, it's >>>>>> >>>>> >> >> > >> better to >>>>>> >>>>> >> >> > >> > > > > have >>>>>> >>>>> >> >> > >> > > > > > a >>>>>> >>>>> >> >> > >> > > > > > > >> complete design for both per-job mode >>>>>> and session mode. >>>>>> >>>>> >> >> > >> > > > Furthermore, >>>>>> >>>>> >> >> > >> > > > > > > even >>>>>> >>>>> >> >> > >> > > > > > > >> if we can do it phase by phase, we >>>>>> need to have a whole >>>>>> >>>>> >> >> > >> picture >>>>>> >>>>> >> >> > >> > > of >>>>>> >>>>> >> >> > >> > > > > how >>>>>> >>>>> >> >> > >> > > > > > > it >>>>>> >>>>> >> >> > >> > > > > > > >> works in both per-job mode and >>>>>> session mode. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> 2) It's better to consider the >>>>>> convenience for users, such >>>>>> >>>>> >> >> > >> as >>>>>> >>>>> >> >> > >> > > > > > debugging >>>>>> >>>>> >> >> > >> > > > > > > >> After we finish this feature, the job >>>>>> graph will be >>>>>> >>>>> >> >> > >> compiled in >>>>>> >>>>> >> >> > >> > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> application master, which means that >>>>>> users cannot easily >>>>>> >>>>> >> >> > >> get the >>>>>> >>>>> >> >> > >> > > > > > > exception >>>>>> >>>>> >> >> > >> > > > > > > >> message synchorousely in the job >>>>>> client if there are >>>>>> >>>>> >> >> > >> problems >>>>>> >>>>> >> >> > >> > > > during >>>>>> >>>>> >> >> > >> > > > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> job graph compiling (especially for >>>>>> platform users), such >>>>>> >>>>> >> >> > >> as the >>>>>> >>>>> >> >> > >> > > > > > > resource >>>>>> >>>>> >> >> > >> > > > > > > >> path is incorrect, the user program >>>>>> itself has some >>>>>> >>>>> >> >> > >> problems, >>>>>> >>>>> >> >> > >> > > etc. >>>>>> >>>>> >> >> > >> > > > > > What >>>>>> >>>>> >> >> > >> > > > > > > I'm >>>>>> >>>>> >> >> > >> > > > > > > >> thinking is that maybe we should >>>>>> throw the exceptions as >>>>>> >>>>> >> >> > >> early >>>>>> >>>>> >> >> > >> > > as >>>>>> >>>>> >> >> > >> > > > > > > possible >>>>>> >>>>> >> >> > >> > > > > > > >> (during job submission stage). >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> 3) It's better to consider the impact >>>>>> to the stability of >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > > > cluster >>>>>> >>>>> >> >> > >> > > > > > > >> If we perform the compiling in the >>>>>> application master, we >>>>>> >>>>> >> >> > >> should >>>>>> >>>>> >> >> > >> > > > > > > consider >>>>>> >>>>> >> >> > >> > > > > > > >> the impact of the compiling errors. >>>>>> Although YARN could >>>>>> >>>>> >> >> > >> resume >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > > > >> application master in case of >>>>>> failures, but in some case >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > > > compiling >>>>>> >>>>> >> >> > >> > > > > > > >> failure may be a waste of cluster >>>>>> resource and may impact >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > > > > stability >>>>>> >>>>> >> >> > >> > > > > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> cluster and the other jobs in the >>>>>> cluster, such as the >>>>>> >>>>> >> >> > >> resource >>>>>> >>>>> >> >> > >> > > > path >>>>>> >>>>> >> >> > >> > > > > > is >>>>>> >>>>> >> >> > >> > > > > > > >> incorrect, the user program itself >>>>>> has some problems(in >>>>>> >>>>> >> >> > >> this >>>>>> >>>>> >> >> > >> > > case, >>>>>> >>>>> >> >> > >> > > > > job >>>>>> >>>>> >> >> > >> > > > > > > >> failover cannot solve this kind of >>>>>> problems) etc. In the >>>>>> >>>>> >> >> > >> current >>>>>> >>>>> >> >> > >> > > > > > > >> implemention, the compiling errors >>>>>> are handled in the >>>>>> >>>>> >> >> > >> client >>>>>> >>>>> >> >> > >> > > side >>>>>> >>>>> >> >> > >> > > > > and >>>>>> >>>>> >> >> > >> > > > > > > there >>>>>> >>>>> >> >> > >> > > > > > > >> is no impact to the cluster at all. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed >>>>>> in the design doc >>>>>> >>>>> >> >> > >> that >>>>>> >>>>> >> >> > >> > > only >>>>>> >>>>> >> >> > >> > > > > > > per-job >>>>>> >>>>> >> >> > >> > > > > > > >> mode will be supported. However, I >>>>>> think it's better to >>>>>> >>>>> >> >> > >> also >>>>>> >>>>> >> >> > >> > > > > consider >>>>>> >>>>> >> >> > >> > > > > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> session mode in the design doc. >>>>>> >>>>> >> >> > >> > > > > > > >> Regarding to 2) and 3), I have not >>>>>> seen related sections >>>>>> >>>>> >> >> > >> in the >>>>>> >>>>> >> >> > >> > > > > design >>>>>> >>>>> >> >> > >> > > > > > > >> doc. It will be good if we can cover >>>>>> them in the design >>>>>> >>>>> >> >> > >> doc. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> Feel free to correct me If there is >>>>>> anything I >>>>>> >>>>> >> >> > >> misunderstand. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> Regards, >>>>>> >>>>> >> >> > >> > > > > > > >> Dian >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>> >> >> > >> > > > > > > >> > 在 2019年12月27日,上午3:13,Peter Huang < >>>>>> >>>>> >> >> > >> huangzhenqiu0...@gmail.com> >>>>>> >>>>> >> >> > >> > > > 写道: >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > Hi Yang, >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > I can't agree more. The effort >>>>>> definitely needs to align >>>>>> >>>>> >> >> > >> with >>>>>> >>>>> >> >> > >> > > > the >>>>>> >>>>> >> >> > >> > > > > > > final >>>>>> >>>>> >> >> > >> > > > > > > >> > goal of FLIP-73. >>>>>> >>>>> >> >> > >> > > > > > > >> > I am thinking about whether we can >>>>>> achieve the goal with >>>>>> >>>>> >> >> > >> two >>>>>> >>>>> >> >> > >> > > > > phases. >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > 1) Phase I >>>>>> >>>>> >> >> > >> > > > > > > >> > As the CLiFrontend will not be >>>>>> depreciated soon. We can >>>>>> >>>>> >> >> > >> still >>>>>> >>>>> >> >> > >> > > > use >>>>>> >>>>> >> >> > >> > > > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> > deployMode flag there, >>>>>> >>>>> >> >> > >> > > > > > > >> > pass the program info through Flink >>>>>> configuration, use >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > > > > > >> > ClassPathJobGraphRetriever >>>>>> >>>>> >> >> > >> > > > > > > >> > to generate the job graph in >>>>>> ClusterEntrypoints of yarn >>>>>> >>>>> >> >> > >> and >>>>>> >>>>> >> >> > >> > > > > > > Kubernetes. >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > 2) Phase II >>>>>> >>>>> >> >> > >> > > > > > > >> > In AbstractJobClusterExecutor, the >>>>>> job graph is >>>>>> >>>>> >> >> > >> generated in >>>>>> >>>>> >> >> > >> > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> execute >>>>>> >>>>> >> >> > >> > > > > > > >> > function. We can still >>>>>> >>>>> >> >> > >> > > > > > > >> > use the deployMode in it. With >>>>>> deployMode = cluster, the >>>>>> >>>>> >> >> > >> > > execute >>>>>> >>>>> >> >> > >> > > > > > > >> function >>>>>> >>>>> >> >> > >> > > > > > > >> > only starts the cluster. >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > When >>>>>> {Yarn/Kuberneates}PerJobClusterEntrypoint starts, >>>>>> >>>>> >> >> > >> It will >>>>>> >>>>> >> >> > >> > > > > start >>>>>> >>>>> >> >> > >> > > > > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> > dispatch first, then we can use >>>>>> >>>>> >> >> > >> > > > > > > >> > a ClusterEnvironment similar to >>>>>> ContextEnvironment to >>>>>> >>>>> >> >> > >> submit >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > job >>>>>> >>>>> >> >> > >> > > > > > > >> with >>>>>> >>>>> >> >> > >> > > > > > > >> > jobName the local >>>>>> >>>>> >> >> > >> > > > > > > >> > dispatcher. For the details, we >>>>>> need more investigation. >>>>>> >>>>> >> >> > >> Let's >>>>>> >>>>> >> >> > >> > > > > wait >>>>>> >>>>> >> >> > >> > > > > > > >> > for @Aljoscha >>>>>> >>>>> >> >> > >> > > > > > > >> > Krettek <aljos...@apache.org> >>>>>> @Till Rohrmann < >>>>>> >>>>> >> >> > >> > > > > trohrm...@apache.org >>>>>> >>>>> >> >> > >> > > > > > >'s >>>>>> >>>>> >> >> > >> > > > > > > >> > feedback after the holiday season. >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > Thank you in advance. Merry >>>>>> Chrismas and Happy New >>>>>> >>>>> >> >> > >> Year!!! >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > Best Regards >>>>>> >>>>> >> >> > >> > > > > > > >> > Peter Huang >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM >>>>>> Yang Wang < >>>>>> >>>>> >> >> > >> > > > danrtsey...@gmail.com> >>>>>> >>>>> >> >> > >> > > > > > > >> wrote: >>>>>> >>>>> >> >> > >> > > > > > > >> > >>>>>> >>>>> >> >> > >> > > > > > > >> >> Hi Peter, >>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>> >>>>> >> >> > >> > > > > > > >> >> I think we need to reconsider >>>>>> tison's suggestion >>>>>> >>>>> >> >> > >> seriously. >>>>>> >>>>> >> >> > >> > > > After >>>>>> >>>>> >> >> > >> > > > > > > >> FLIP-73, >>>>>> >>>>> >> >> > >> > > > > > > >> >> the deployJobCluster has >>>>>> >>>>> >> >> > >> > > > > > > >> >> beenmoved into >>>>>> `JobClusterExecutor#execute`. It should >>>>>> >>>>> >> >> > >> not be >>>>>> >>>>> >> >> > >> > > > > > > perceived >>>>>> >>>>> >> >> > >> > > > > > > >> >> for `CliFrontend`. That >>>>>> >>>>> >> >> > >> > > > > > > >> >> means the user program will >>>>>> *ALWAYS* be executed on >>>>>> >>>>> >> >> > >> client >>>>>> >>>>> >> >> > >> > > > side. >>>>>> >>>>> >> >> > >> > > > > > This >>>>>> >>>>> >> >> > >> > > > > > > >> is >>>>>> >>>>> >> >> > >> > > > > > > >> >> the by design behavior. >>>>>> >>>>> >> >> > >> > > > > > > >> >> So, we could not just add >>>>>> `if(client mode) .. else >>>>>> >>>>> >> >> > >> if(cluster >>>>>> >>>>> >> >> > >> > > > > mode) >>>>>> >>>>> >> >> > >> > > > > > > >> ...` >>>>>> >>>>> >> >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass >>>>>> >>>>> >> >> > >> > > > > > > >> >> the executor. We need to find a >>>>>> clean way to decouple >>>>>> >>>>> >> >> > >> > > executing >>>>>> >>>>> >> >> > >> > > > > > user >>>>>> >>>>> >> >> > >> > > > > > > >> >> program and deploying per-job >>>>>> >>>>> >> >> > >> > > > > > > >> >> cluster. Based on this, we could >>>>>> support to execute user >>>>>> >>>>> >> >> > >> > > > program >>>>>> >>>>> >> >> > >> > > > > on >>>>>> >>>>> >> >> > >> > > > > > > >> client >>>>>> >>>>> >> >> > >> > > > > > > >> >> or master side. >>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>> >>>>> >> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give >>>>>> some good >>>>>> >>>>> >> >> > >> suggestions. >>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>> >>>>> >> >> > >> > > > > > > >> >> Best, >>>>>> >>>>> >> >> > >> > > > > > > >> >> Yang >>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>> >>>>> >> >> > >> > > > > > > >> >> Peter Huang < >>>>>> huangzhenqiu0...@gmail.com> 于2019年12月25日周三 >>>>>> >>>>> >> >> > >> > > > > 上午4:03写道: >>>>>> >>>>> >> >> > >> > > > > > > >> >> >>>>>> >>>>> >> >> > >> > > > > > > >> >>> Hi Jingjing, >>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>> The improvement proposed is a >>>>>> deployment option for >>>>>> >>>>> >> >> > >> CLI. For >>>>>> >>>>> >> >> > >> > > > SQL >>>>>> >>>>> >> >> > >> > > > > > > based >>>>>> >>>>> >> >> > >> > > > > > > >> >>> Flink application, It is more >>>>>> convenient to use the >>>>>> >>>>> >> >> > >> existing >>>>>> >>>>> >> >> > >> > > > > model >>>>>> >>>>> >> >> > >> > > > > > > in >>>>>> >>>>> >> >> > >> > > > > > > >> >>> SqlClient in which >>>>>> >>>>> >> >> > >> > > > > > > >> >>> the job graph is generated within >>>>>> SqlClient. After >>>>>> >>>>> >> >> > >> adding >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > > > delayed >>>>>> >>>>> >> >> > >> > > > > > > >> job >>>>>> >>>>> >> >> > >> > > > > > > >> >>> graph generation, I think there >>>>>> is no change is needed >>>>>> >>>>> >> >> > >> for >>>>>> >>>>> >> >> > >> > > > your >>>>>> >>>>> >> >> > >> > > > > > > side. >>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>> Best Regards >>>>>> >>>>> >> >> > >> > > > > > > >> >>> Peter Huang >>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM >>>>>> jingjing bai < >>>>>> >>>>> >> >> > >> > > > > > > >> baijingjing7...@gmail.com> >>>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>>>> >>>>> >> >> > >> > > > > > > >> >>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> hi peter: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> we had extension SqlClent to >>>>>> support sql job >>>>>> >>>>> >> >> > >> submit in >>>>>> >>>>> >> >> > >> > > web >>>>>> >>>>> >> >> > >> > > > > > base >>>>>> >>>>> >> >> > >> > > > > > > on >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> flink 1.9. we support submit >>>>>> to yarn on per job >>>>>> >>>>> >> >> > >> mode too. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> in this case, the job graph >>>>>> generated on client >>>>>> >>>>> >> >> > >> side >>>>>> >>>>> >> >> > >> > > . I >>>>>> >>>>> >> >> > >> > > > > > think >>>>>> >>>>> >> >> > >> > > > > > > >> >>> this >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> discuss Mainly to improve api >>>>>> programme. but in my >>>>>> >>>>> >> >> > >> case , >>>>>> >>>>> >> >> > >> > > > > there >>>>>> >>>>> >> >> > >> > > > > > is >>>>>> >>>>> >> >> > >> > > > > > > >> no >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> jar to upload but only a sql >>>>>> string . >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> do u had more suggestion to >>>>>> improve for sql mode >>>>>> >>>>> >> >> > >> or it >>>>>> >>>>> >> >> > >> > > is >>>>>> >>>>> >> >> > >> > > > > > only a >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> switch for api programme? >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> best >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> bai jj >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> Yang Wang <danrtsey...@gmail.com> >>>>>> 于2019年12月18日周三 >>>>>> >>>>> >> >> > >> 下午7:21写道: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> I just want to revive this >>>>>> discussion. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Recently, i am thinking about >>>>>> how to natively run >>>>>> >>>>> >> >> > >> flink >>>>>> >>>>> >> >> > >> > > > > per-job >>>>>> >>>>> >> >> > >> > > > > > > >> >>> cluster on >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Kubernetes. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes >>>>>> is very different >>>>>> >>>>> >> >> > >> from on >>>>>> >>>>> >> >> > >> > > > Yarn. >>>>>> >>>>> >> >> > >> > > > > > And >>>>>> >>>>> >> >> > >> > > > > > > >> we >>>>>> >>>>> >> >> > >> > > > > > > >> >>> will >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> have >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> the same deployment >>>>>> requirements to the client and >>>>>> >>>>> >> >> > >> entry >>>>>> >>>>> >> >> > >> > > > > point. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 1. Flink client not always need >>>>>> a local jar to start >>>>>> >>>>> >> >> > >> a >>>>>> >>>>> >> >> > >> > > Flink >>>>>> >>>>> >> >> > >> > > > > > > per-job >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster. We could >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> support multiple schemas. For >>>>>> example, >>>>>> >>>>> >> >> > >> > > > file:///path/of/my.jar >>>>>> >>>>> >> >> > >> > > > > > > means >>>>>> >>>>> >> >> > >> > > > > > > >> a >>>>>> >>>>> >> >> > >> > > > > > > >> >>> jar >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> located >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at client side, >>>>>> >>>>> >> >> > >> hdfs://myhdfs/user/myname/flink/my.jar >>>>>> >>>>> >> >> > >> > > > means a >>>>>> >>>>> >> >> > >> > > > > > jar >>>>>> >>>>> >> >> > >> > > > > > > >> >>> located >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> at >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> remote hdfs, >>>>>> local:///path/in/image/my.jar means a >>>>>> >>>>> >> >> > >> jar >>>>>> >>>>> >> >> > >> > > > located >>>>>> >>>>> >> >> > >> > > > > > at >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> jobmanager side. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 2. Support running user program >>>>>> on master side. This >>>>>> >>>>> >> >> > >> also >>>>>> >>>>> >> >> > >> > > > > means >>>>>> >>>>> >> >> > >> > > > > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> >>> entry >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> point >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> will generate the job graph on >>>>>> master side. We could >>>>>> >>>>> >> >> > >> use >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> or start a local Flink client >>>>>> to achieve this >>>>>> >>>>> >> >> > >> purpose. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do >>>>>> you think this is the >>>>>> >>>>> >> >> > >> right >>>>>> >>>>> >> >> > >> > > > > > > >> direction we >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to work? >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> tison <wander4...@gmail.com> >>>>>> 于2019年12月12日周四 >>>>>> >>>>> >> >> > >> 下午4:48写道: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> A quick idea is that we >>>>>> separate the deployment >>>>>> >>>>> >> >> > >> from user >>>>>> >>>>> >> >> > >> > > > > > program >>>>>> >>>>> >> >> > >> > > > > > > >> >>> that >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> it >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> has always been done >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> outside the program. On user >>>>>> program executed there >>>>>> >>>>> >> >> > >> is >>>>>> >>>>> >> >> > >> > > > > always a >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> ClusterClient that >>>>>> communicates with >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> an existing cluster, remote or >>>>>> local. It will be >>>>>> >>>>> >> >> > >> another >>>>>> >>>>> >> >> > >> > > > > thread >>>>>> >>>>> >> >> > >> > > > > > > so >>>>>> >>>>> >> >> > >> > > > > > > >> >>> just >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> for >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> your information. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> Best, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> tison <wander4...@gmail.com> >>>>>> 于2019年12月12日周四 >>>>>> >>>>> >> >> > >> 下午4:40写道: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Hi Peter, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Another concern I realized >>>>>> recently is that with >>>>>> >>>>> >> >> > >> current >>>>>> >>>>> >> >> > >> > > > > > > Executors >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73) >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> I'm afraid that user program >>>>>> is designed to ALWAYS >>>>>> >>>>> >> >> > >> run >>>>>> >>>>> >> >> > >> > > on >>>>>> >>>>> >> >> > >> > > > > the >>>>>> >>>>> >> >> > >> > > > > > > >> >>> client >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> side. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Specifically, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> we deploy the job in executor >>>>>> when env.execute >>>>>> >>>>> >> >> > >> called. >>>>>> >>>>> >> >> > >> > > > This >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> abstraction >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> possibly prevents >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Flink runs user program on >>>>>> the cluster side. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> For your proposal, in this >>>>>> case we already >>>>>> >>>>> >> >> > >> compiled the >>>>>> >>>>> >> >> > >> > > > > > program >>>>>> >>>>> >> >> > >> > > > > > > >> and >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> run >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> on >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> the client side, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and >>>>>> retrieve job graph >>>>>> >>>>> >> >> > >> from >>>>>> >>>>> >> >> > >> > > > program >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> metadata, it >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> doesn't make >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> many sense. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do >>>>>> you think about this >>>>>> >>>>> >> >> > >> > > > > constraint? >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Best, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> tison. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> Peter Huang < >>>>>> huangzhenqiu0...@gmail.com> >>>>>> >>>>> >> >> > >> 于2019年12月10日周二 >>>>>> >>>>> >> >> > >> > > > > > > >> 下午12:45写道: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Hi Tison, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think >>>>>> I made the wrong >>>>>> >>>>> >> >> > >> argument >>>>>> >>>>> >> >> > >> > > in >>>>>> >>>>> >> >> > >> > > > > the >>>>>> >>>>> >> >> > >> > > > > > > doc. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar >>>>>> problem is only for >>>>>> >>>>> >> >> > >> > > platform >>>>>> >>>>> >> >> > >> > > > > > > users. >>>>>> >>>>> >> >> > >> > > > > > > >> >>> In >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> our >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> internal deploy service, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> we further optimized the >>>>>> deployment latency by >>>>>> >>>>> >> >> > >> letting >>>>>> >>>>> >> >> > >> > > > > users >>>>>> >>>>> >> >> > >> > > > > > to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> packaging >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> flink-runtime together with >>>>>> the uber jar, so that >>>>>> >>>>> >> >> > >> we >>>>>> >>>>> >> >> > >> > > > don't >>>>>> >>>>> >> >> > >> > > > > > need >>>>>> >>>>> >> >> > >> > > > > > > >> to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> consider >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> multiple flink version >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> support for now. In the >>>>>> session client mode, as >>>>>> >>>>> >> >> > >> Flink >>>>>> >>>>> >> >> > >> > > > libs >>>>>> >>>>> >> >> > >> > > > > > will >>>>>> >>>>> >> >> > >> > > > > > > >> be >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> shipped >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> anyway as local resources of >>>>>> yarn. Users actually >>>>>> >>>>> >> >> > >> don't >>>>>> >>>>> >> >> > >> > > > > need >>>>>> >>>>> >> >> > >> > > > > > to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> package >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> those libs into job jar. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Best Regards >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Peter Huang >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35 >>>>>> PM tison < >>>>>> >>>>> >> >> > >> > > > wander4...@gmail.com >>>>>> >>>>> >> >> > >> > > > > > >>>>>> >>>>> >> >> > >> > > > > > > >> >>> wrote: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>>>> the package? Do users >>>>>> >>>>> >> >> > >> need >>>>>> >>>>> >> >> > >> > > to >>>>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients, >>>>>> flink-optimizer, >>>>>> >>>>> >> >> > >> flink-table >>>>>> >>>>> >> >> > >> > > > > codes? >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> The answer should be no >>>>>> because they exist in >>>>>> >>>>> >> >> > >> system >>>>>> >>>>> >> >> > >> > > > > > > classpath. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Best, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> tison. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> Yang Wang < >>>>>> danrtsey...@gmail.com> 于2019年12月10日周二 >>>>>> >>>>> >> >> > >> > > > > 下午12:18写道: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting >>>>>> this discussion. I >>>>>> >>>>> >> >> > >> think >>>>>> >>>>> >> >> > >> > > this >>>>>> >>>>> >> >> > >> > > > > is >>>>>> >>>>> >> >> > >> > > > > > a >>>>>> >>>>> >> >> > >> > > > > > > >> >>> very >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> useful >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> feature. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am >>>>>> focused on flink on >>>>>> >>>>> >> >> > >> > > Kubernetes >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> integration >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> and >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> come >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> across the same >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the >>>>>> job graph generated >>>>>> >>>>> >> >> > >> on >>>>>> >>>>> >> >> > >> > > > client >>>>>> >>>>> >> >> > >> > > > > > > side. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> Instead, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> the >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> user jars are built in >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When >>>>>> the job manager >>>>>> >>>>> >> >> > >> launched, >>>>>> >>>>> >> >> > >> > > we >>>>>> >>>>> >> >> > >> > > > > > just >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> need to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> generate the job graph >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> based on local user jars. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> I have some small >>>>>> suggestion about this. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 1. >>>>>> `ProgramJobGraphRetriever` is very similar to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> `ClasspathJobGraphRetriever`, the differences >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> are the former needs >>>>>> `ProgramMetadata` and the >>>>>> >>>>> >> >> > >> latter >>>>>> >>>>> >> >> > >> > > > > needs >>>>>> >>>>> >> >> > >> > > > > > > >> >>> some >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> arguments. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Is it possible to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> have an unified >>>>>> `JobGraphRetriever` to support >>>>>> >>>>> >> >> > >> both? >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not >>>>>> use a local user jar to >>>>>> >>>>> >> >> > >> > > start >>>>>> >>>>> >> >> > >> > > > a >>>>>> >>>>> >> >> > >> > > > > > > >> >>> per-job >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> cluster? >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> In your case, the user >>>>>> jars has >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already >>>>>> and we do need to >>>>>> >>>>> >> >> > >> download >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > jars >>>>>> >>>>> >> >> > >> > > > > > > to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> deployer >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> always need a local user >>>>>> jar to start a flink >>>>>> >>>>> >> >> > >> > > cluster. >>>>>> >>>>> >> >> > >> > > > It >>>>>> >>>>> >> >> > >> > > > > > is >>>>>> >>>>> >> >> > >> > > > > > > >> >>> be >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> great >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> if >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> we >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> could support remote user >>>>>> jars. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, >>>>>> we assume users package >>>>>> >>>>> >> >> > >> > > > > > > >> >>> flink-clients, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, >>>>>> flink-table together within >>>>>> >>>>> >> >> > >> the job >>>>>> >>>>> >> >> > >> > > > jar. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> Otherwise, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> job graph generation within >>>>>> >>>>> >> >> > >> JobClusterEntryPoint will >>>>>> >>>>> >> >> > >> > > > > fail. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about >>>>>> the package? Do users >>>>>> >>>>> >> >> > >> need >>>>>> >>>>> >> >> > >> > > to >>>>>> >>>>> >> >> > >> > > > > > > >> >>> compile >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> their >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients, >>>>>> flink-optimizer, >>>>>> >>>>> >> >> > >> flink-table >>>>>> >>>>> >> >> > >> > > > > > codes? >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Best, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Yang >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang < >>>>>> huangzhenqiu0...@gmail.com> >>>>>> >>>>> >> >> > >> > > > 于2019年12月10日周二 >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> 上午2:37写道: >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Dear All, >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink >>>>>> community starts to >>>>>> >>>>> >> >> > >> improve the >>>>>> >>>>> >> >> > >> > > > yarn >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> cluster >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> descriptor >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and >>>>>> config files configurable >>>>>> >>>>> >> >> > >> from >>>>>> >>>>> >> >> > >> > > > CLI. >>>>>> >>>>> >> >> > >> > > > > It >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> improves >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> flexibility of Flink >>>>>> deployment Yarn Per Job >>>>>> >>>>> >> >> > >> Mode. >>>>>> >>>>> >> >> > >> > > > For >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> platform >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> users >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>> who >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds >>>>>> of streaming pipelines >>>>>> >>>>> >> >> > >> for >>>>>> >>>>> >> >> > >> > > the >>>>>> >>>>> >> >> > >> > > > > > whole >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> org >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> or >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job >>>>>> graph generation in >>>>>> >>>>> >> >> > >> > > > > client-side >>>>>> >>>>> >> >> > >> > > > > > is >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>> another >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want >>>>>> to propose a >>>>>> >>>>> >> >> > >> configurable >>>>>> >>>>> >> >> > >> > > > > feature >>>>>> >>>>> >> >> > >> > > > > > > >> >>> for >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The >>>>>> feature can allow >>>>>> >>>>> >> >> > >> users to >>>>>> >>>>> >> >> > >> > > > > choose >>>>>> >>>>> >> >> > >> > > > > > > >> >>> the >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> job >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> graph >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink >>>>>> ClusterEntryPoint so that >>>>>> >>>>> >> >> > >> the >>>>>> >>>>> >> >> > >> > > job >>>>>> >>>>> >> >> > >> > > > > jar >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> doesn't >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> need >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>> to >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job >>>>>> graph generation. The >>>>>> >>>>> >> >> > >> > > proposal >>>>>> >>>>> >> >> > >> > > > is >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> organized >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>> as ahttps://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> . >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and >>>>>> suggestions are welcomed. >>>>>> >>>>> >> >> > >> Thank >>>>>> >>>>> >> >> > >> > > you >>>>>> >>>>> >> >> > >> > > > in >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>> advance. >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards >>>>>> >>>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang