+1 to star voting. Best, tison.
Yang Wang <danrtsey...@gmail.com> 于2020年3月5日周四 下午2:29写道: > Hi Peter, > Really thanks for your response. > > Hi all @Kostas Kloudas <kklou...@gmail.com> @Zili Chen > <wander4...@gmail.com> @Peter Huang <huangzhenqiu0...@gmail.com> @Rong > Rong <walter...@gmail.com> > It seems that we have reached an agreement. The “application mode” > is regarded as the enhanced “per-job”. It is > orthogonal with “cluster deploy”. Currently, we bind the “per-job” to > `run-user-main-on-client` and “application mode” > to `run-user-main-on-cluster`. > > Do you have other concerns to moving FLIP-85 to voting? > > > Best, > Yang > > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年3月5日周四 下午12:48写道: > >> Hi Yang and Kostas, >> >> Thanks for the clarification. It makes more sense to me if the long term >> goal is to replace per job mode to application mode >> in the future (at the time that multiple execute can be supported). >> Before that, It will be better to keep the concept of >> application mode internally. As Yang suggested, User only need to use a >> `-R/-- remote-deploy` cli option to launch >> a per job cluster with the main function executed in cluster >> entry-point. +1 for the execution plan. >> >> >> >> Best Regards >> Peter Huang >> >> >> >> >> On Tue, Mar 3, 2020 at 7:11 AM Yang Wang <danrtsey...@gmail.com> wrote: >> >>> Hi Peter, >>> >>> Having the application mode does not mean we will drop the cluster-deploy >>> option. I just want to share some thoughts about “Application Mode”. >>> >>> >>> 1. The application mode could cover the per-job sematic. Its lifecyle is >>> bound >>> to the user `main()`. And all the jobs in the user main will be executed >>> in a same >>> Flink cluster. In first phase of FLIP-85 implementation, running user >>> main on the >>> cluster side could be supported in application mode. >>> >>> 2. Maybe in the future, we also need to support multiple `execute()` on >>> client side >>> in a same Flink cluster. Then the per-job mode will evolve to >>> application mode. >>> >>> 3. From user perspective, only a `-R/-- remote-deploy` cli option is >>> visible. They >>> are not aware of the application mode. >>> >>> 4. In the first phase, the application mode is working as “per-job”(only >>> one job in >>> the user main). We just leave more potential for the future. >>> >>> >>> I am not against with calling it “cluster deploy mode” if you all think >>> it is clearer for users. >>> >>> >>> >>> Best, >>> Yang >>> >>> Kostas Kloudas <kklou...@gmail.com> 于2020年3月3日周二 下午6:49写道: >>> >>>> Hi Peter, >>>> >>>> I understand your point. This is why I was also a bit torn about the >>>> name and my proposal was a bit aligned with yours (something along the >>>> lines of "cluster deploy" mode). >>>> >>>> But many of the other participants in the discussion suggested the >>>> "Application Mode". I think that the reasoning is that now the user's >>>> Application is more self-contained. >>>> It will be submitted to the cluster and the user can just disconnect. >>>> In addition, as discussed briefly in the doc, in the future there may >>>> be better support for multi-execute applications which will bring us >>>> one step closer to the true "Application Mode". But this is how I >>>> interpreted their arguments, of course they can also express their >>>> thoughts on the topic :) >>>> >>>> Cheers, >>>> Kostas >>>> >>>> On Mon, Mar 2, 2020 at 6:15 PM Peter Huang <huangzhenqiu0...@gmail.com> >>>> wrote: >>>> > >>>> > Hi Kostas, >>>> > >>>> > Thanks for updating the wiki. We have aligned with the >>>> implementations in the doc. But I feel it is still a little bit confusing >>>> of the naming from a user's perspective. It is well known that Flink >>>> support per job cluster and session cluster. The concept is in the layer of >>>> how a job is managed within Flink. The method introduced util now is a kind >>>> of mixing job and session cluster to promising the implementation >>>> complexity. We probably don't need to label it as Application Model as the >>>> same layer of per job cluster and session cluster. Conceptually, I think it >>>> is still a cluster mode implementation for per job cluster. >>>> > >>>> > To minimize the confusion of users, I think it would be better just >>>> an option of per job cluster for each type of cluster manager. How do you >>>> think? >>>> > >>>> > >>>> > Best Regards >>>> > Peter Huang >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas <kklou...@gmail.com> >>>> wrote: >>>> >> >>>> >> Hi Yang, >>>> >> >>>> >> The difference between per-job and application mode is that, as you >>>> >> described, in the per-job mode the main is executed on the client >>>> >> while in the application mode, the main is executed on the cluster. >>>> >> I do not think we have to offer "application mode" with running the >>>> >> main on the client side as this is exactly what the per-job mode does >>>> >> currently and, as you described also, it would be redundant. >>>> >> >>>> >> Sorry if this was not clear in the document. >>>> >> >>>> >> Cheers, >>>> >> Kostas >>>> >> >>>> >> On Mon, Mar 2, 2020 at 3:17 PM Yang Wang <danrtsey...@gmail.com> >>>> wrote: >>>> >> > >>>> >> > Hi Kostas, >>>> >> > >>>> >> > Thanks a lot for your conclusion and updating the FLIP-85 WIKI. >>>> Currently, i have no more >>>> >> > questions about motivation, approach, fault tolerance and the >>>> first phase implementation. >>>> >> > >>>> >> > I think the new title "Flink Application Mode" makes a lot senses >>>> to me. Especially for the >>>> >> > containerized environment, the cluster deploy option will be very >>>> useful. >>>> >> > >>>> >> > Just one concern, how do we introduce this new application mode to >>>> our users? >>>> >> > Each user program(i.e. `main()`) is an application. Currently, we >>>> intend to only support one >>>> >> > `execute()`. So what's the difference between per-job and >>>> application mode? >>>> >> > >>>> >> > For per-job, user `main()` is always executed on client side. And >>>> For application mode, user >>>> >> > `main()` could be executed on client or master side(configured via >>>> cli option). >>>> >> > Right? We need to have a clear concept. Otherwise, the users will >>>> be more and more confusing. >>>> >> > >>>> >> > >>>> >> > Best, >>>> >> > Yang >>>> >> > >>>> >> > Kostas Kloudas <kklou...@gmail.com> 于2020年3月2日周一 下午5:58写道: >>>> >> >> >>>> >> >> Hi all, >>>> >> >> >>>> >> >> I update >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode >>>> >> >> based on the discussion we had here: >>>> >> >> >>>> >> >> >>>> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit# >>>> >> >> >>>> >> >> Please let me know what you think and please keep the discussion >>>> in the ML :) >>>> >> >> >>>> >> >> Thanks for starting the discussion and I hope that soon we will be >>>> >> >> able to vote on the FLIP. >>>> >> >> >>>> >> >> Cheers, >>>> >> >> Kostas >>>> >> >> >>>> >> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang <danrtsey...@gmail.com> >>>> wrote: >>>> >> >> > >>>> >> >> > Hi all, >>>> >> >> > >>>> >> >> > Thanks a lot for the feedback from @Kostas Kloudas. Your all >>>> concerns are >>>> >> >> > on point. The FLIP-85 is mainly >>>> >> >> > focused on supporting cluster mode for per-job. Since it is >>>> more urgent and >>>> >> >> > have much more use >>>> >> >> > cases both in Yarn and Kubernetes deployment. For session >>>> cluster, we could >>>> >> >> > have more discussion >>>> >> >> > in a new thread later. >>>> >> >> > >>>> >> >> > #1, How to download the user jars and dependencies for per-job >>>> in cluster >>>> >> >> > mode? >>>> >> >> > For Yarn, we could register the user jars and dependencies as >>>> >> >> > LocalResource. They will be distributed >>>> >> >> > by Yarn. And once the JobManager and TaskManager launched, the >>>> jars are >>>> >> >> > already exists. >>>> >> >> > For Standalone per-job and K8s, we expect that the user jars >>>> >> >> > and dependencies are built into the image. >>>> >> >> > Or the InitContainer could be used for downloading. It is >>>> natively >>>> >> >> > distributed and we will not have bottleneck. >>>> >> >> > >>>> >> >> > #2, Job graph recovery >>>> >> >> > We could have an optimization to store job graph on the DFS. >>>> However, i >>>> >> >> > suggest building a new jobgraph >>>> >> >> > from the configuration is the default option. Since we will not >>>> always have >>>> >> >> > a DFS store when deploying a >>>> >> >> > Flink per-job cluster. Of course, we assume that using the same >>>> >> >> > configuration(e.g. job_id, user_jar, main_class, >>>> >> >> > main_args, parallelism, savepoint_settings, etc.) will get a >>>> same job >>>> >> >> > graph. I think the standalone per-job >>>> >> >> > already has the similar behavior. >>>> >> >> > >>>> >> >> > #3, What happens with jobs that have multiple execute calls? >>>> >> >> > Currently, it is really a problem. Even we use a local client >>>> on Flink >>>> >> >> > master side, it will have different behavior with >>>> >> >> > client mode. For client mode, if we execute multiple times, >>>> then we will >>>> >> >> > deploy multiple Flink clusters for each execute. >>>> >> >> > I am not pretty sure whether it is reasonable. However, i still >>>> think using >>>> >> >> > the local client is a good choice. We could >>>> >> >> > continue the discussion in a new thread. @Zili Chen < >>>> wander4...@gmail.com> Do >>>> >> >> > you want to drive this? >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > Best, >>>> >> >> > Yang >>>> >> >> > >>>> >> >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月16日周四 >>>> 上午1:55写道: >>>> >> >> > >>>> >> >> > > Hi Kostas, >>>> >> >> > > >>>> >> >> > > Thanks for this feedback. I can't agree more about the >>>> opinion. The >>>> >> >> > > cluster mode should be added >>>> >> >> > > first in per job cluster. >>>> >> >> > > >>>> >> >> > > 1) For job cluster implementation >>>> >> >> > > 1. Job graph recovery from configuration or store as static >>>> job graph as >>>> >> >> > > session cluster. I think the static one will be better for >>>> less recovery >>>> >> >> > > time. >>>> >> >> > > Let me update the doc for details. >>>> >> >> > > >>>> >> >> > > 2. For job execute multiple times, I think @Zili Chen >>>> >> >> > > <wander4...@gmail.com> has proposed the local client >>>> solution that can >>>> >> >> > > the run program actually in the cluster entry point. We can >>>> put the >>>> >> >> > > implementation in the second stage, >>>> >> >> > > or even a new FLIP for further discussion. >>>> >> >> > > >>>> >> >> > > 2) For session cluster implementation >>>> >> >> > > We can disable the cluster mode for the session cluster in >>>> the first >>>> >> >> > > stage. I agree the jar downloading will be a painful thing. >>>> >> >> > > We can consider about PoC and performance evaluation first. >>>> If the end to >>>> >> >> > > end experience is good enough, then we can consider >>>> >> >> > > proceeding with the solution. >>>> >> >> > > >>>> >> >> > > Looking forward to more opinions from @Yang Wang < >>>> danrtsey...@gmail.com> @Zili >>>> >> >> > > Chen <wander4...@gmail.com> @Dian Fu <dian0511...@gmail.com>. >>>> >> >> > > >>>> >> >> > > >>>> >> >> > > Best Regards >>>> >> >> > > Peter Huang >>>> >> >> > > >>>> >> >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas < >>>> kklou...@gmail.com> wrote: >>>> >> >> > > >>>> >> >> > >> Hi all, >>>> >> >> > >> >>>> >> >> > >> I am writing here as the discussion on the Google Doc seems >>>> to be a >>>> >> >> > >> bit difficult to follow. >>>> >> >> > >> >>>> >> >> > >> I think that in order to be able to make progress, it would >>>> be helpful >>>> >> >> > >> to focus on per-job mode for now. >>>> >> >> > >> The reason is that: >>>> >> >> > >> 1) making the (unique) JobSubmitHandler responsible for >>>> creating the >>>> >> >> > >> jobgraphs, >>>> >> >> > >> which includes downloading dependencies, is not an optimal >>>> solution >>>> >> >> > >> 2) even if we put the responsibility on the JobMaster, >>>> currently each >>>> >> >> > >> job has its own >>>> >> >> > >> JobMaster but they all run on the same process, so we have >>>> again a >>>> >> >> > >> single entity. >>>> >> >> > >> >>>> >> >> > >> Of course after this is done, and if we feel comfortable >>>> with the >>>> >> >> > >> solution, then we can go to the session mode. >>>> >> >> > >> >>>> >> >> > >> A second comment has to do with fault-tolerance in the >>>> per-job, >>>> >> >> > >> cluster-deploy mode. >>>> >> >> > >> In the document, it is suggested that upon recovery, the >>>> JobMaster of >>>> >> >> > >> each job re-creates the JobGraph. >>>> >> >> > >> I am just wondering if it is better to create and store the >>>> jobGraph >>>> >> >> > >> upon submission and only fetch it >>>> >> >> > >> upon recovery so that we have a static jobGraph. >>>> >> >> > >> >>>> >> >> > >> Finally, I have a question which is what happens with jobs >>>> that have >>>> >> >> > >> multiple execute calls? >>>> >> >> > >> The semantics seem to change compared to the current >>>> behaviour, right? >>>> >> >> > >> >>>> >> >> > >> Cheers, >>>> >> >> > >> Kostas >>>> >> >> > >> >>>> >> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison <wander4...@gmail.com> >>>> wrote: >>>> >> >> > >> > >>>> >> >> > >> > not always, Yang Wang is also not yet a committer but he >>>> can join the >>>> >> >> > >> > channel. I cannot find the id by clicking “Add new member >>>> in channel” so >>>> >> >> > >> > come to you and ask for try out the link. Possibly I will >>>> find other >>>> >> >> > >> ways >>>> >> >> > >> > but the original purpose is that the slack channel is a >>>> public area we >>>> >> >> > >> > discuss about developing... >>>> >> >> > >> > Best, >>>> >> >> > >> > tison. >>>> >> >> > >> > >>>> >> >> > >> > >>>> >> >> > >> > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月9日周四 >>>> 上午2:44写道: >>>> >> >> > >> > >>>> >> >> > >> > > Hi Tison, >>>> >> >> > >> > > >>>> >> >> > >> > > I am not the committer of Flink yet. I think I can't >>>> join it also. >>>> >> >> > >> > > >>>> >> >> > >> > > >>>> >> >> > >> > > Best Regards >>>> >> >> > >> > > Peter Huang >>>> >> >> > >> > > >>>> >> >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison < >>>> wander4...@gmail.com> wrote: >>>> >> >> > >> > > >>>> >> >> > >> > > > Hi Peter, >>>> >> >> > >> > > > >>>> >> >> > >> > > > Could you try out this link? >>>> >> >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH >>>> >> >> > >> > > > >>>> >> >> > >> > > > Best, >>>> >> >> > >> > > > tison. >>>> >> >> > >> > > > >>>> >> >> > >> > > > >>>> >> >> > >> > > > Peter Huang <huangzhenqiu0...@gmail.com> 于2020年1月9日周四 >>>> 上午1:22写道: >>>> >> >> > >> > > > >>>> >> >> > >> > > > > Hi Tison, >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > I can't join the group with shared link. Would you >>>> please add me >>>> >> >> > >> into >>>> >> >> > >> > > the >>>> >> >> > >> > > > > group? My slack account is huangzhenqiu0825. >>>> >> >> > >> > > > > Thank you in advance. >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > Best Regards >>>> >> >> > >> > > > > Peter Huang >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison < >>>> wander4...@gmail.com> >>>> >> >> > >> wrote: >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > > Hi Peter, >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > As described above, this effort should get >>>> attention from people >>>> >> >> > >> > > > > developing >>>> >> >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I recommend >>>> you to join >>>> >> >> > >> the >>>> >> >> > >> > > > public >>>> >> >> > >> > > > > > slack channel[1] for Flink Client API Enhancement >>>> and you can >>>> >> >> > >> try to >>>> >> >> > >> > > > > share >>>> >> >> > >> > > > > > you detailed thoughts there. It possibly gets more >>>> concrete >>>> >> >> > >> > > attentions. >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > Best, >>>> >> >> > >> > > > > > tison. >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > [1] >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > >>>> >> >> > >> > > > >>>> >> >> > >> > > >>>> >> >> > >> >>>> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > Peter Huang <huangzhenqiu0...@gmail.com> >>>> 于2020年1月7日周二 上午5:09写道: >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > > Dear All, >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > Happy new year! According to existing feedback >>>> from the >>>> >> >> > >> community, >>>> >> >> > >> > > we >>>> >> >> > >> > > > > > > revised the doc with the consideration of >>>> session cluster >>>> >> >> > >> support, >>>> >> >> > >> > > > and >>>> >> >> > >> > > > > > > concrete interface changes needed and execution >>>> plan. Please >>>> >> >> > >> take >>>> >> >> > >> > > one >>>> >> >> > >> > > > > > more >>>> >> >> > >> > > > > > > round of review at your most convenient time. >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > >>>> >> >> > >> > > > >>>> >> >> > >> > > >>>> >> >> > >> >>>> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit# >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > Best Regards >>>> >> >> > >> > > > > > > Peter Huang >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter Huang < >>>> >> >> > >> > > > > huangzhenqiu0...@gmail.com> >>>> >> >> > >> > > > > > > wrote: >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > > > Hi Dian, >>>> >> >> > >> > > > > > > > Thanks for giving us valuable feedbacks. >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > 1) It's better to have a whole design for this >>>> feature >>>> >> >> > >> > > > > > > > For the suggestion of enabling the cluster >>>> mode also session >>>> >> >> > >> > > > > cluster, I >>>> >> >> > >> > > > > > > > think Flink already supported it. >>>> WebSubmissionExtension >>>> >> >> > >> already >>>> >> >> > >> > > > > allows >>>> >> >> > >> > > > > > > > users to start a job with the specified jar by >>>> using web UI. >>>> >> >> > >> > > > > > > > But we need to enable the feature from CLI for >>>> both local >>>> >> >> > >> jar, >>>> >> >> > >> > > > remote >>>> >> >> > >> > > > > > > jar. >>>> >> >> > >> > > > > > > > I will align with Yang Wang first about the >>>> details and >>>> >> >> > >> update >>>> >> >> > >> > > the >>>> >> >> > >> > > > > > design >>>> >> >> > >> > > > > > > > doc. >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > 2) It's better to consider the convenience for >>>> users, such >>>> >> >> > >> as >>>> >> >> > >> > > > > debugging >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > I am wondering whether we can store the >>>> exception in >>>> >> >> > >> jobgragh >>>> >> >> > >> > > > > > > > generation in application master. As no >>>> streaming graph can >>>> >> >> > >> be >>>> >> >> > >> > > > > > scheduled >>>> >> >> > >> > > > > > > in >>>> >> >> > >> > > > > > > > this case, there will be no more TM will be >>>> requested from >>>> >> >> > >> > > FlinkRM. >>>> >> >> > >> > > > > > > > If the AM is still running, users can still >>>> query it from >>>> >> >> > >> CLI. As >>>> >> >> > >> > > > it >>>> >> >> > >> > > > > > > > requires more change, we can get some feedback >>>> from < >>>> >> >> > >> > > > > > aljos...@apache.org >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > and @zjf...@gmail.com <zjf...@gmail.com>. >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > 3) It's better to consider the impact to the >>>> stability of >>>> >> >> > >> the >>>> >> >> > >> > > > cluster >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > I agree with Yang Wang's opinion. >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > Best Regards >>>> >> >> > >> > > > > > > > Peter Huang >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian Fu < >>>> >> >> > >> dian0511...@gmail.com> >>>> >> >> > >> > > > > wrote: >>>> >> >> > >> > > > > > > > >>>> >> >> > >> > > > > > > >> Hi all, >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> Sorry to jump into this discussion. Thanks >>>> everyone for the >>>> >> >> > >> > > > > > discussion. >>>> >> >> > >> > > > > > > >> I'm very interested in this topic although >>>> I'm not an >>>> >> >> > >> expert in >>>> >> >> > >> > > > this >>>> >> >> > >> > > > > > > part. >>>> >> >> > >> > > > > > > >> So I'm glad to share my thoughts as following: >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> 1) It's better to have a whole design for >>>> this feature >>>> >> >> > >> > > > > > > >> As we know, there are two deployment modes: >>>> per-job mode >>>> >> >> > >> and >>>> >> >> > >> > > > session >>>> >> >> > >> > > > > > > >> mode. I'm wondering which mode really needs >>>> this feature. >>>> >> >> > >> As the >>>> >> >> > >> > > > > > design >>>> >> >> > >> > > > > > > doc >>>> >> >> > >> > > > > > > >> mentioned, per-job mode is more used for >>>> streaming jobs and >>>> >> >> > >> > > > session >>>> >> >> > >> > > > > > > mode is >>>> >> >> > >> > > > > > > >> usually used for batch jobs(Of course, the >>>> job types and >>>> >> >> > >> the >>>> >> >> > >> > > > > > deployment >>>> >> >> > >> > > > > > > >> modes are orthogonal). Usually streaming job >>>> is only >>>> >> >> > >> needed to >>>> >> >> > >> > > be >>>> >> >> > >> > > > > > > submitted >>>> >> >> > >> > > > > > > >> once and it will run for days or weeks, while >>>> batch jobs >>>> >> >> > >> will be >>>> >> >> > >> > > > > > > submitted >>>> >> >> > >> > > > > > > >> more frequently compared with streaming jobs. >>>> This means >>>> >> >> > >> that >>>> >> >> > >> > > > maybe >>>> >> >> > >> > > > > > > session >>>> >> >> > >> > > > > > > >> mode also needs this feature. However, if we >>>> support this >>>> >> >> > >> > > feature >>>> >> >> > >> > > > in >>>> >> >> > >> > > > > > > >> session mode, the application master will >>>> become the new >>>> >> >> > >> > > > centralized >>>> >> >> > >> > > > > > > >> service(which should be solved). So in this >>>> case, it's >>>> >> >> > >> better to >>>> >> >> > >> > > > > have >>>> >> >> > >> > > > > > a >>>> >> >> > >> > > > > > > >> complete design for both per-job mode and >>>> session mode. >>>> >> >> > >> > > > Furthermore, >>>> >> >> > >> > > > > > > even >>>> >> >> > >> > > > > > > >> if we can do it phase by phase, we need to >>>> have a whole >>>> >> >> > >> picture >>>> >> >> > >> > > of >>>> >> >> > >> > > > > how >>>> >> >> > >> > > > > > > it >>>> >> >> > >> > > > > > > >> works in both per-job mode and session mode. >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> 2) It's better to consider the convenience >>>> for users, such >>>> >> >> > >> as >>>> >> >> > >> > > > > > debugging >>>> >> >> > >> > > > > > > >> After we finish this feature, the job graph >>>> will be >>>> >> >> > >> compiled in >>>> >> >> > >> > > > the >>>> >> >> > >> > > > > > > >> application master, which means that users >>>> cannot easily >>>> >> >> > >> get the >>>> >> >> > >> > > > > > > exception >>>> >> >> > >> > > > > > > >> message synchorousely in the job client if >>>> there are >>>> >> >> > >> problems >>>> >> >> > >> > > > during >>>> >> >> > >> > > > > > the >>>> >> >> > >> > > > > > > >> job graph compiling (especially for platform >>>> users), such >>>> >> >> > >> as the >>>> >> >> > >> > > > > > > resource >>>> >> >> > >> > > > > > > >> path is incorrect, the user program itself >>>> has some >>>> >> >> > >> problems, >>>> >> >> > >> > > etc. >>>> >> >> > >> > > > > > What >>>> >> >> > >> > > > > > > I'm >>>> >> >> > >> > > > > > > >> thinking is that maybe we should throw the >>>> exceptions as >>>> >> >> > >> early >>>> >> >> > >> > > as >>>> >> >> > >> > > > > > > possible >>>> >> >> > >> > > > > > > >> (during job submission stage). >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> 3) It's better to consider the impact to the >>>> stability of >>>> >> >> > >> the >>>> >> >> > >> > > > > cluster >>>> >> >> > >> > > > > > > >> If we perform the compiling in the >>>> application master, we >>>> >> >> > >> should >>>> >> >> > >> > > > > > > consider >>>> >> >> > >> > > > > > > >> the impact of the compiling errors. Although >>>> YARN could >>>> >> >> > >> resume >>>> >> >> > >> > > the >>>> >> >> > >> > > > > > > >> application master in case of failures, but >>>> in some case >>>> >> >> > >> the >>>> >> >> > >> > > > > compiling >>>> >> >> > >> > > > > > > >> failure may be a waste of cluster resource >>>> and may impact >>>> >> >> > >> the >>>> >> >> > >> > > > > > stability >>>> >> >> > >> > > > > > > the >>>> >> >> > >> > > > > > > >> cluster and the other jobs in the cluster, >>>> such as the >>>> >> >> > >> resource >>>> >> >> > >> > > > path >>>> >> >> > >> > > > > > is >>>> >> >> > >> > > > > > > >> incorrect, the user program itself has some >>>> problems(in >>>> >> >> > >> this >>>> >> >> > >> > > case, >>>> >> >> > >> > > > > job >>>> >> >> > >> > > > > > > >> failover cannot solve this kind of problems) >>>> etc. In the >>>> >> >> > >> current >>>> >> >> > >> > > > > > > >> implemention, the compiling errors are >>>> handled in the >>>> >> >> > >> client >>>> >> >> > >> > > side >>>> >> >> > >> > > > > and >>>> >> >> > >> > > > > > > there >>>> >> >> > >> > > > > > > >> is no impact to the cluster at all. >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed in the >>>> design doc >>>> >> >> > >> that >>>> >> >> > >> > > only >>>> >> >> > >> > > > > > > per-job >>>> >> >> > >> > > > > > > >> mode will be supported. However, I think it's >>>> better to >>>> >> >> > >> also >>>> >> >> > >> > > > > consider >>>> >> >> > >> > > > > > > the >>>> >> >> > >> > > > > > > >> session mode in the design doc. >>>> >> >> > >> > > > > > > >> Regarding to 2) and 3), I have not seen >>>> related sections >>>> >> >> > >> in the >>>> >> >> > >> > > > > design >>>> >> >> > >> > > > > > > >> doc. It will be good if we can cover them in >>>> the design >>>> >> >> > >> doc. >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> Feel free to correct me If there is anything I >>>> >> >> > >> misunderstand. >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> Regards, >>>> >> >> > >> > > > > > > >> Dian >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> > 在 2019年12月27日,上午3:13,Peter Huang < >>>> >> >> > >> huangzhenqiu0...@gmail.com> >>>> >> >> > >> > > > 写道: >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > Hi Yang, >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > I can't agree more. The effort definitely >>>> needs to align >>>> >> >> > >> with >>>> >> >> > >> > > > the >>>> >> >> > >> > > > > > > final >>>> >> >> > >> > > > > > > >> > goal of FLIP-73. >>>> >> >> > >> > > > > > > >> > I am thinking about whether we can achieve >>>> the goal with >>>> >> >> > >> two >>>> >> >> > >> > > > > phases. >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > 1) Phase I >>>> >> >> > >> > > > > > > >> > As the CLiFrontend will not be depreciated >>>> soon. We can >>>> >> >> > >> still >>>> >> >> > >> > > > use >>>> >> >> > >> > > > > > the >>>> >> >> > >> > > > > > > >> > deployMode flag there, >>>> >> >> > >> > > > > > > >> > pass the program info through Flink >>>> configuration, use >>>> >> >> > >> the >>>> >> >> > >> > > > > > > >> > ClassPathJobGraphRetriever >>>> >> >> > >> > > > > > > >> > to generate the job graph in >>>> ClusterEntrypoints of yarn >>>> >> >> > >> and >>>> >> >> > >> > > > > > > Kubernetes. >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > 2) Phase II >>>> >> >> > >> > > > > > > >> > In AbstractJobClusterExecutor, the job >>>> graph is >>>> >> >> > >> generated in >>>> >> >> > >> > > > the >>>> >> >> > >> > > > > > > >> execute >>>> >> >> > >> > > > > > > >> > function. We can still >>>> >> >> > >> > > > > > > >> > use the deployMode in it. With deployMode = >>>> cluster, the >>>> >> >> > >> > > execute >>>> >> >> > >> > > > > > > >> function >>>> >> >> > >> > > > > > > >> > only starts the cluster. >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > When >>>> {Yarn/Kuberneates}PerJobClusterEntrypoint starts, >>>> >> >> > >> It will >>>> >> >> > >> > > > > start >>>> >> >> > >> > > > > > > the >>>> >> >> > >> > > > > > > >> > dispatch first, then we can use >>>> >> >> > >> > > > > > > >> > a ClusterEnvironment similar to >>>> ContextEnvironment to >>>> >> >> > >> submit >>>> >> >> > >> > > the >>>> >> >> > >> > > > > job >>>> >> >> > >> > > > > > > >> with >>>> >> >> > >> > > > > > > >> > jobName the local >>>> >> >> > >> > > > > > > >> > dispatcher. For the details, we need more >>>> investigation. >>>> >> >> > >> Let's >>>> >> >> > >> > > > > wait >>>> >> >> > >> > > > > > > >> > for @Aljoscha >>>> >> >> > >> > > > > > > >> > Krettek <aljos...@apache.org> @Till >>>> Rohrmann < >>>> >> >> > >> > > > > trohrm...@apache.org >>>> >> >> > >> > > > > > >'s >>>> >> >> > >> > > > > > > >> > feedback after the holiday season. >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > Thank you in advance. Merry Chrismas and >>>> Happy New >>>> >> >> > >> Year!!! >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > Best Regards >>>> >> >> > >> > > > > > > >> > Peter Huang >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM Yang Wang < >>>> >> >> > >> > > > danrtsey...@gmail.com> >>>> >> >> > >> > > > > > > >> wrote: >>>> >> >> > >> > > > > > > >> > >>>> >> >> > >> > > > > > > >> >> Hi Peter, >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >> I think we need to reconsider tison's >>>> suggestion >>>> >> >> > >> seriously. >>>> >> >> > >> > > > After >>>> >> >> > >> > > > > > > >> FLIP-73, >>>> >> >> > >> > > > > > > >> >> the deployJobCluster has >>>> >> >> > >> > > > > > > >> >> beenmoved into >>>> `JobClusterExecutor#execute`. It should >>>> >> >> > >> not be >>>> >> >> > >> > > > > > > perceived >>>> >> >> > >> > > > > > > >> >> for `CliFrontend`. That >>>> >> >> > >> > > > > > > >> >> means the user program will *ALWAYS* be >>>> executed on >>>> >> >> > >> client >>>> >> >> > >> > > > side. >>>> >> >> > >> > > > > > This >>>> >> >> > >> > > > > > > >> is >>>> >> >> > >> > > > > > > >> >> the by design behavior. >>>> >> >> > >> > > > > > > >> >> So, we could not just add `if(client mode) >>>> .. else >>>> >> >> > >> if(cluster >>>> >> >> > >> > > > > mode) >>>> >> >> > >> > > > > > > >> ...` >>>> >> >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass >>>> >> >> > >> > > > > > > >> >> the executor. We need to find a clean way >>>> to decouple >>>> >> >> > >> > > executing >>>> >> >> > >> > > > > > user >>>> >> >> > >> > > > > > > >> >> program and deploying per-job >>>> >> >> > >> > > > > > > >> >> cluster. Based on this, we could support >>>> to execute user >>>> >> >> > >> > > > program >>>> >> >> > >> > > > > on >>>> >> >> > >> > > > > > > >> client >>>> >> >> > >> > > > > > > >> >> or master side. >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give some >>>> good >>>> >> >> > >> suggestions. >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >> Best, >>>> >> >> > >> > > > > > > >> >> Yang >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >> Peter Huang <huangzhenqiu0...@gmail.com> >>>> 于2019年12月25日周三 >>>> >> >> > >> > > > > 上午4:03写道: >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >>> Hi Jingjing, >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >>> The improvement proposed is a deployment >>>> option for >>>> >> >> > >> CLI. For >>>> >> >> > >> > > > SQL >>>> >> >> > >> > > > > > > based >>>> >> >> > >> > > > > > > >> >>> Flink application, It is more convenient >>>> to use the >>>> >> >> > >> existing >>>> >> >> > >> > > > > model >>>> >> >> > >> > > > > > > in >>>> >> >> > >> > > > > > > >> >>> SqlClient in which >>>> >> >> > >> > > > > > > >> >>> the job graph is generated within >>>> SqlClient. After >>>> >> >> > >> adding >>>> >> >> > >> > > the >>>> >> >> > >> > > > > > > delayed >>>> >> >> > >> > > > > > > >> job >>>> >> >> > >> > > > > > > >> >>> graph generation, I think there is no >>>> change is needed >>>> >> >> > >> for >>>> >> >> > >> > > > your >>>> >> >> > >> > > > > > > side. >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >>> Best Regards >>>> >> >> > >> > > > > > > >> >>> Peter Huang >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM jingjing >>>> bai < >>>> >> >> > >> > > > > > > >> baijingjing7...@gmail.com> >>>> >> >> > >> > > > > > > >> >>> wrote: >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >>>> hi peter: >>>> >> >> > >> > > > > > > >> >>>> we had extension SqlClent to support >>>> sql job >>>> >> >> > >> submit in >>>> >> >> > >> > > web >>>> >> >> > >> > > > > > base >>>> >> >> > >> > > > > > > on >>>> >> >> > >> > > > > > > >> >>>> flink 1.9. we support submit to yarn >>>> on per job >>>> >> >> > >> mode too. >>>> >> >> > >> > > > > > > >> >>>> in this case, the job graph >>>> generated on client >>>> >> >> > >> side >>>> >> >> > >> > > . I >>>> >> >> > >> > > > > > think >>>> >> >> > >> > > > > > > >> >>> this >>>> >> >> > >> > > > > > > >> >>>> discuss Mainly to improve api >>>> programme. but in my >>>> >> >> > >> case , >>>> >> >> > >> > > > > there >>>> >> >> > >> > > > > > is >>>> >> >> > >> > > > > > > >> no >>>> >> >> > >> > > > > > > >> >>>> jar to upload but only a sql string . >>>> >> >> > >> > > > > > > >> >>>> do u had more suggestion to improve >>>> for sql mode >>>> >> >> > >> or it >>>> >> >> > >> > > is >>>> >> >> > >> > > > > > only a >>>> >> >> > >> > > > > > > >> >>>> switch for api programme? >>>> >> >> > >> > > > > > > >> >>>> >>>> >> >> > >> > > > > > > >> >>>> >>>> >> >> > >> > > > > > > >> >>>> best >>>> >> >> > >> > > > > > > >> >>>> bai jj >>>> >> >> > >> > > > > > > >> >>>> >>>> >> >> > >> > > > > > > >> >>>> >>>> >> >> > >> > > > > > > >> >>>> Yang Wang <danrtsey...@gmail.com> >>>> 于2019年12月18日周三 >>>> >> >> > >> 下午7:21写道: >>>> >> >> > >> > > > > > > >> >>>> >>>> >> >> > >> > > > > > > >> >>>>> I just want to revive this discussion. >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>>> Recently, i am thinking about how to >>>> natively run >>>> >> >> > >> flink >>>> >> >> > >> > > > > per-job >>>> >> >> > >> > > > > > > >> >>> cluster on >>>> >> >> > >> > > > > > > >> >>>>> Kubernetes. >>>> >> >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes is very >>>> different >>>> >> >> > >> from on >>>> >> >> > >> > > > Yarn. >>>> >> >> > >> > > > > > And >>>> >> >> > >> > > > > > > >> we >>>> >> >> > >> > > > > > > >> >>> will >>>> >> >> > >> > > > > > > >> >>>>> have >>>> >> >> > >> > > > > > > >> >>>>> the same deployment requirements to the >>>> client and >>>> >> >> > >> entry >>>> >> >> > >> > > > > point. >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>>> 1. Flink client not always need a local >>>> jar to start >>>> >> >> > >> a >>>> >> >> > >> > > Flink >>>> >> >> > >> > > > > > > per-job >>>> >> >> > >> > > > > > > >> >>>>> cluster. We could >>>> >> >> > >> > > > > > > >> >>>>> support multiple schemas. For example, >>>> >> >> > >> > > > file:///path/of/my.jar >>>> >> >> > >> > > > > > > means >>>> >> >> > >> > > > > > > >> a >>>> >> >> > >> > > > > > > >> >>> jar >>>> >> >> > >> > > > > > > >> >>>>> located >>>> >> >> > >> > > > > > > >> >>>>> at client side, >>>> >> >> > >> hdfs://myhdfs/user/myname/flink/my.jar >>>> >> >> > >> > > > means a >>>> >> >> > >> > > > > > jar >>>> >> >> > >> > > > > > > >> >>> located >>>> >> >> > >> > > > > > > >> >>>>> at >>>> >> >> > >> > > > > > > >> >>>>> remote hdfs, >>>> local:///path/in/image/my.jar means a >>>> >> >> > >> jar >>>> >> >> > >> > > > located >>>> >> >> > >> > > > > > at >>>> >> >> > >> > > > > > > >> >>>>> jobmanager side. >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>>> 2. Support running user program on >>>> master side. This >>>> >> >> > >> also >>>> >> >> > >> > > > > means >>>> >> >> > >> > > > > > > the >>>> >> >> > >> > > > > > > >> >>> entry >>>> >> >> > >> > > > > > > >> >>>>> point >>>> >> >> > >> > > > > > > >> >>>>> will generate the job graph on master >>>> side. We could >>>> >> >> > >> use >>>> >> >> > >> > > the >>>> >> >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever >>>> >> >> > >> > > > > > > >> >>>>> or start a local Flink client to >>>> achieve this >>>> >> >> > >> purpose. >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do you >>>> think this is the >>>> >> >> > >> right >>>> >> >> > >> > > > > > > >> direction we >>>> >> >> > >> > > > > > > >> >>>>> need to work? >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>>> tison <wander4...@gmail.com> >>>> 于2019年12月12日周四 >>>> >> >> > >> 下午4:48写道: >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>>>> A quick idea is that we separate the >>>> deployment >>>> >> >> > >> from user >>>> >> >> > >> > > > > > program >>>> >> >> > >> > > > > > > >> >>> that >>>> >> >> > >> > > > > > > >> >>>>> it >>>> >> >> > >> > > > > > > >> >>>>>> has always been done >>>> >> >> > >> > > > > > > >> >>>>>> outside the program. On user program >>>> executed there >>>> >> >> > >> is >>>> >> >> > >> > > > > always a >>>> >> >> > >> > > > > > > >> >>>>>> ClusterClient that communicates with >>>> >> >> > >> > > > > > > >> >>>>>> an existing cluster, remote or local. >>>> It will be >>>> >> >> > >> another >>>> >> >> > >> > > > > thread >>>> >> >> > >> > > > > > > so >>>> >> >> > >> > > > > > > >> >>> just >>>> >> >> > >> > > > > > > >> >>>>> for >>>> >> >> > >> > > > > > > >> >>>>>> your information. >>>> >> >> > >> > > > > > > >> >>>>>> >>>> >> >> > >> > > > > > > >> >>>>>> Best, >>>> >> >> > >> > > > > > > >> >>>>>> tison. >>>> >> >> > >> > > > > > > >> >>>>>> >>>> >> >> > >> > > > > > > >> >>>>>> >>>> >> >> > >> > > > > > > >> >>>>>> tison <wander4...@gmail.com> >>>> 于2019年12月12日周四 >>>> >> >> > >> 下午4:40写道: >>>> >> >> > >> > > > > > > >> >>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> Hi Peter, >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> Another concern I realized recently >>>> is that with >>>> >> >> > >> current >>>> >> >> > >> > > > > > > Executors >>>> >> >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73) >>>> >> >> > >> > > > > > > >> >>>>>>> I'm afraid that user program is >>>> designed to ALWAYS >>>> >> >> > >> run >>>> >> >> > >> > > on >>>> >> >> > >> > > > > the >>>> >> >> > >> > > > > > > >> >>> client >>>> >> >> > >> > > > > > > >> >>>>>> side. >>>> >> >> > >> > > > > > > >> >>>>>>> Specifically, >>>> >> >> > >> > > > > > > >> >>>>>>> we deploy the job in executor when >>>> env.execute >>>> >> >> > >> called. >>>> >> >> > >> > > > This >>>> >> >> > >> > > > > > > >> >>>>> abstraction >>>> >> >> > >> > > > > > > >> >>>>>>> possibly prevents >>>> >> >> > >> > > > > > > >> >>>>>>> Flink runs user program on the >>>> cluster side. >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> For your proposal, in this case we >>>> already >>>> >> >> > >> compiled the >>>> >> >> > >> > > > > > program >>>> >> >> > >> > > > > > > >> and >>>> >> >> > >> > > > > > > >> >>>>> run >>>> >> >> > >> > > > > > > >> >>>>>> on >>>> >> >> > >> > > > > > > >> >>>>>>> the client side, >>>> >> >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and retrieve >>>> job graph >>>> >> >> > >> from >>>> >> >> > >> > > > program >>>> >> >> > >> > > > > > > >> >>>>> metadata, it >>>> >> >> > >> > > > > > > >> >>>>>>> doesn't make >>>> >> >> > >> > > > > > > >> >>>>>>> many sense. >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do you >>>> think about this >>>> >> >> > >> > > > > constraint? >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> Best, >>>> >> >> > >> > > > > > > >> >>>>>>> tison. >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> Peter Huang < >>>> huangzhenqiu0...@gmail.com> >>>> >> >> > >> 于2019年12月10日周二 >>>> >> >> > >> > > > > > > >> 下午12:45写道: >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> Hi Tison, >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think I made >>>> the wrong >>>> >> >> > >> argument >>>> >> >> > >> > > in >>>> >> >> > >> > > > > the >>>> >> >> > >> > > > > > > doc. >>>> >> >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar problem >>>> is only for >>>> >> >> > >> > > platform >>>> >> >> > >> > > > > > > users. >>>> >> >> > >> > > > > > > >> >>> In >>>> >> >> > >> > > > > > > >> >>>>> our >>>> >> >> > >> > > > > > > >> >>>>>>>> internal deploy service, >>>> >> >> > >> > > > > > > >> >>>>>>>> we further optimized the deployment >>>> latency by >>>> >> >> > >> letting >>>> >> >> > >> > > > > users >>>> >> >> > >> > > > > > to >>>> >> >> > >> > > > > > > >> >>>>>> packaging >>>> >> >> > >> > > > > > > >> >>>>>>>> flink-runtime together with the uber >>>> jar, so that >>>> >> >> > >> we >>>> >> >> > >> > > > don't >>>> >> >> > >> > > > > > need >>>> >> >> > >> > > > > > > >> to >>>> >> >> > >> > > > > > > >> >>>>>>>> consider >>>> >> >> > >> > > > > > > >> >>>>>>>> multiple flink version >>>> >> >> > >> > > > > > > >> >>>>>>>> support for now. In the session >>>> client mode, as >>>> >> >> > >> Flink >>>> >> >> > >> > > > libs >>>> >> >> > >> > > > > > will >>>> >> >> > >> > > > > > > >> be >>>> >> >> > >> > > > > > > >> >>>>>> shipped >>>> >> >> > >> > > > > > > >> >>>>>>>> anyway as local resources of yarn. >>>> Users actually >>>> >> >> > >> don't >>>> >> >> > >> > > > > need >>>> >> >> > >> > > > > > to >>>> >> >> > >> > > > > > > >> >>>>> package >>>> >> >> > >> > > > > > > >> >>>>>>>> those libs into job jar. >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> Best Regards >>>> >> >> > >> > > > > > > >> >>>>>>>> Peter Huang >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35 PM tison >>>> < >>>> >> >> > >> > > > wander4...@gmail.com >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > > > >> >>> wrote: >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about the >>>> package? Do users >>>> >> >> > >> need >>>> >> >> > >> > > to >>>> >> >> > >> > > > > > > >> >>> compile >>>> >> >> > >> > > > > > > >> >>>>>> their >>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>> >> >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients, >>>> flink-optimizer, >>>> >> >> > >> flink-table >>>> >> >> > >> > > > > codes? >>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>> The answer should be no because >>>> they exist in >>>> >> >> > >> system >>>> >> >> > >> > > > > > > classpath. >>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>> Best, >>>> >> >> > >> > > > > > > >> >>>>>>>>> tison. >>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>> Yang Wang <danrtsey...@gmail.com> >>>> 于2019年12月10日周二 >>>> >> >> > >> > > > > 下午12:18写道: >>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter, >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting this >>>> discussion. I >>>> >> >> > >> think >>>> >> >> > >> > > this >>>> >> >> > >> > > > > is >>>> >> >> > >> > > > > > a >>>> >> >> > >> > > > > > > >> >>> very >>>> >> >> > >> > > > > > > >> >>>>>>>> useful >>>> >> >> > >> > > > > > > >> >>>>>>>>>> feature. >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am focused on >>>> flink on >>>> >> >> > >> > > Kubernetes >>>> >> >> > >> > > > > > > >> >>>>> integration >>>> >> >> > >> > > > > > > >> >>>>>> and >>>> >> >> > >> > > > > > > >> >>>>>>>>> come >>>> >> >> > >> > > > > > > >> >>>>>>>>>> across the same >>>> >> >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the job >>>> graph generated >>>> >> >> > >> on >>>> >> >> > >> > > > client >>>> >> >> > >> > > > > > > side. >>>> >> >> > >> > > > > > > >> >>>>>>>> Instead, >>>> >> >> > >> > > > > > > >> >>>>>>>>> the >>>> >> >> > >> > > > > > > >> >>>>>>>>>> user jars are built in >>>> >> >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When the job >>>> manager >>>> >> >> > >> launched, >>>> >> >> > >> > > we >>>> >> >> > >> > > > > > just >>>> >> >> > >> > > > > > > >> >>>>> need to >>>> >> >> > >> > > > > > > >> >>>>>>>>>> generate the job graph >>>> >> >> > >> > > > > > > >> >>>>>>>>>> based on local user jars. >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> I have some small suggestion about >>>> this. >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> 1. `ProgramJobGraphRetriever` is >>>> very similar to >>>> >> >> > >> > > > > > > >> >>>>>>>>>> `ClasspathJobGraphRetriever`, the >>>> differences >>>> >> >> > >> > > > > > > >> >>>>>>>>>> are the former needs >>>> `ProgramMetadata` and the >>>> >> >> > >> latter >>>> >> >> > >> > > > > needs >>>> >> >> > >> > > > > > > >> >>> some >>>> >> >> > >> > > > > > > >> >>>>>>>>> arguments. >>>> >> >> > >> > > > > > > >> >>>>>>>>>> Is it possible to >>>> >> >> > >> > > > > > > >> >>>>>>>>>> have an unified >>>> `JobGraphRetriever` to support >>>> >> >> > >> both? >>>> >> >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not use a >>>> local user jar to >>>> >> >> > >> > > start >>>> >> >> > >> > > > a >>>> >> >> > >> > > > > > > >> >>> per-job >>>> >> >> > >> > > > > > > >> >>>>>>>> cluster? >>>> >> >> > >> > > > > > > >> >>>>>>>>>> In your case, the user jars has >>>> >> >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already and we do >>>> need to >>>> >> >> > >> download >>>> >> >> > >> > > the >>>> >> >> > >> > > > > jars >>>> >> >> > >> > > > > > > to >>>> >> >> > >> > > > > > > >> >>>>>>>> deployer >>>> >> >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we >>>> >> >> > >> > > > > > > >> >>>>>>>>>> always need a local user jar to >>>> start a flink >>>> >> >> > >> > > cluster. >>>> >> >> > >> > > > It >>>> >> >> > >> > > > > > is >>>> >> >> > >> > > > > > > >> >>> be >>>> >> >> > >> > > > > > > >> >>>>>> great >>>> >> >> > >> > > > > > > >> >>>>>>>> if >>>> >> >> > >> > > > > > > >> >>>>>>>>> we >>>> >> >> > >> > > > > > > >> >>>>>>>>>> could support remote user jars. >>>> >> >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, we assume >>>> users package >>>> >> >> > >> > > > > > > >> >>> flink-clients, >>>> >> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, flink-table >>>> together within >>>> >> >> > >> the job >>>> >> >> > >> > > > jar. >>>> >> >> > >> > > > > > > >> >>>>> Otherwise, >>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>> >> >> > >> > > > > > > >> >>>>>>>>>> job graph generation within >>>> >> >> > >> JobClusterEntryPoint will >>>> >> >> > >> > > > > fail. >>>> >> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about the >>>> package? Do users >>>> >> >> > >> need >>>> >> >> > >> > > to >>>> >> >> > >> > > > > > > >> >>> compile >>>> >> >> > >> > > > > > > >> >>>>>> their >>>> >> >> > >> > > > > > > >> >>>>>>>>> jars >>>> >> >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients, >>>> flink-optimizer, >>>> >> >> > >> flink-table >>>> >> >> > >> > > > > > codes? >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> Best, >>>> >> >> > >> > > > > > > >> >>>>>>>>>> Yang >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang < >>>> huangzhenqiu0...@gmail.com> >>>> >> >> > >> > > > 于2019年12月10日周二 >>>> >> >> > >> > > > > > > >> >>>>> 上午2:37写道: >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Dear All, >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink community >>>> starts to >>>> >> >> > >> improve the >>>> >> >> > >> > > > yarn >>>> >> >> > >> > > > > > > >> >>>>> cluster >>>> >> >> > >> > > > > > > >> >>>>>>>>>> descriptor >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and config files >>>> configurable >>>> >> >> > >> from >>>> >> >> > >> > > > CLI. >>>> >> >> > >> > > > > It >>>> >> >> > >> > > > > > > >> >>>>>> improves >>>> >> >> > >> > > > > > > >> >>>>>>>> the >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> flexibility of Flink deployment >>>> Yarn Per Job >>>> >> >> > >> Mode. >>>> >> >> > >> > > > For >>>> >> >> > >> > > > > > > >> >>>>> platform >>>> >> >> > >> > > > > > > >> >>>>>>>> users >>>> >> >> > >> > > > > > > >> >>>>>>>>>> who >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds of >>>> streaming pipelines >>>> >> >> > >> for >>>> >> >> > >> > > the >>>> >> >> > >> > > > > > whole >>>> >> >> > >> > > > > > > >> >>>>> org >>>> >> >> > >> > > > > > > >> >>>>>> or >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job graph >>>> generation in >>>> >> >> > >> > > > > client-side >>>> >> >> > >> > > > > > is >>>> >> >> > >> > > > > > > >> >>>>>> another >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want to >>>> propose a >>>> >> >> > >> configurable >>>> >> >> > >> > > > > feature >>>> >> >> > >> > > > > > > >> >>> for >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The feature >>>> can allow >>>> >> >> > >> users to >>>> >> >> > >> > > > > choose >>>> >> >> > >> > > > > > > >> >>> the >>>> >> >> > >> > > > > > > >> >>>>> job >>>> >> >> > >> > > > > > > >> >>>>>>>>> graph >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink >>>> ClusterEntryPoint so that >>>> >> >> > >> the >>>> >> >> > >> > > job >>>> >> >> > >> > > > > jar >>>> >> >> > >> > > > > > > >> >>>>> doesn't >>>> >> >> > >> > > > > > > >> >>>>>>>> need >>>> >> >> > >> > > > > > > >> >>>>>>>>> to >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job graph >>>> generation. The >>>> >> >> > >> > > proposal >>>> >> >> > >> > > > is >>>> >> >> > >> > > > > > > >> >>>>> organized >>>> >> >> > >> > > > > > > >> >>>>>>>> as a >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> FLIP >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>> >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > >>>> >> >> > >> > > > >>>> >> >> > >> > > >>>> >> >> > >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> . >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and suggestions are >>>> welcomed. >>>> >> >> > >> Thank >>>> >> >> > >> > > you >>>> >> >> > >> > > > in >>>> >> >> > >> > > > > > > >> >>>>> advance. >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang >>>> >> >> > >> > > > > > > >> >>>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>>> >>>> >> >> > >> > > > > > > >> >>>>>> >>>> >> >> > >> > > > > > > >> >>>>> >>>> >> >> > >> > > > > > > >> >>>> >>>> >> >> > >> > > > > > > >> >>> >>>> >> >> > >> > > > > > > >> >> >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >> >>>> >> >> > >> > > > > > > >>>> >> >> > >> > > > > > >>>> >> >> > >> > > > > >>>> >> >> > >> > > > >>>> >> >> > >> > > >>>> >> >> > >> >>>> >> >> > > >>>> >> >> >>>> >>>