Hi Tison, Thanks for the FLIP and launching the discussion!
As a first note, big +1 on providing/exposing a JobClient to the users! Some points that would be nice to be clarified: 1) You mention that we can get rid of the DETACHED mode: I agree that at a high level, given that everything will now be asynchronous, there is no need to keep the DETACHED mode but I think we should specify some aspects. For example, without the explicit separation of the modes, what happens when the job finishes. Does the client periodically poll for the result always or the result is pushed when in NON-DETACHED mode? What happens if the client disconnects and reconnects? 2) On the "how to retrieve a JobClient for a running Job", I think this is related to the other discussion you opened in the ML about multi-layered clients. First of all, I agree that exposing different "levels" of clients would be a nice addition, and actually there have been some discussions about doing so in the future. Now for this specific discussion: i) I do not think that we should expose the ClusterDescriptor/ClusterSpecification to the user, as this ties us to a specific architecture which may change in the future. ii) I do not think it should be the Executor that will provide a JobClient for an already running job (only for the Jobs that it submits). The job of the executor should just be to execute() a pipeline. iii) I think a solution that respects the separation of concerns could be the addition of another component (in the future), something like a ClientFactory, or ClusterFactory that will have methods like: ClusterClient createCluster(Configuration), JobClient retrieveJobClient(Configuration , JobId), maybe even (although not sure) Executor getExecutor(Configuration ) and maybe more. This component would be responsible to interact with a cluster manager like Yarn and do what is now being done by the ClusterDescriptor plus some more stuff. Although under the hood all these abstractions (Environments, Executors, ...) underneath use the same clients, I believe their job/existence is not contradicting but they simply hide some of the complexity from the user, and give us, as developers some freedom to change in the future some of the parts. For example, the executor will take a Pipeline, create a JobGraph and submit it, instead of requiring the user to do each step separately. This allows us to, for example, get rid of the Plan if in the future everything is DataStream. Essentially, I think of these as layers of an onion with the clients being close to the core. The higher you go, the more functionality is included and hidden from the public eye. Point iii) by the way is just a thought and by no means final. I also like the idea of multi-layered clients so this may spark up the discussion. Cheers, Kostas On Wed, Sep 25, 2019 at 2:21 PM Aljoscha Krettek <aljos...@apache.org> wrote: > > Hi Tison, > > Thanks for proposing the document! I had some comments on the document. > > I think the only complex thing that we still need to figure out is how to get > a JobClient for a job that is already running. As you mentioned in the > document. Currently I’m thinking that its ok to add a method to Executor for > retrieving a JobClient for a running job by providing an ID. Let’s see what > Kostas has to say on the topic. > > Best, > Aljoscha > > > On 25. Sep 2019, at 12:31, Zili Chen <wander4...@gmail.com> wrote: > > > > Hi all, > > > > Summary from the discussion about introducing Flink JobClient API[1] we > > draft FLIP-74[2] to > > gather thoughts and towards a standard public user-facing interfaces. > > > > This discussion thread aims at standardizing job level client API. But I'd > > like to emphasize that > > how to retrieve JobClient possibly causes further discussion on different > > level clients exposed from > > Flink so that a following thread will be started later to coordinate > > FLIP-73 and FLIP-74 on > > expose issue. > > > > Looking forward to your opinions. > > > > Best, > > tison. > > > > [1] > > https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E > > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API >