Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

Zili Chen Fri, 27 Sep 2019 01:37:46 -0700

Thanks for your reply Kostas.

As mentioned in FLIP-74 thread[1] there are two questions on Executor design


(1) Where Executor is in a multi-layered clients view.
(2) A bit more details about PerJobExecutor implementation.

For (1) Where Executor is in a multi-layered clients view,

As described in the multi-layered client thread[2], in our current
codebase, with JobClient
introduced in FLIP-74, clients can be layered as

1) ClusterDescriptor: interact with external resource manager, responsible
for deploy Flink
application cluster and retrieve Flink application cluster client.
2) ClusterClient: interact with Flink application cluster, responsible for
query cluster level
status, submit Flink job and retrieve Flink job client.
3) JobClient: interact with Flink job, responsible for query job level
status and perform job
level operation such as trigger savepoint.

However, the singularity is JobCluster, which couple a bit cluster
deployment and job
submission. From my perspective with FLIP-73 and Kostas's thoughts in
FLIP-74 thread,
we form a multi-layered client as below

1) Executor: responsible for job submission, whether the corresponding
cluster is
SessionCluster or JobCluster doesn't matter. Executor always returns
JobClient.
2). ClusterClientFactory: responsible for deploy session cluster and
retrieve session cluster
client.
3). ClusterClient: interact with session cluster, responsible for query
cluster level
status, submit Flink job and retrieve Flink job client.
4) JobClient: interact with Flink job, responsible for query job level
status and perform job
level operation such as trigger savepoint.

I am not sure if the structure above is the same as that in your mind. If
so, there are two questions

I). It seems we cannot have a ClusterClient of JobCluster. Is it
expected(due to the cluster bound to the job)?
II). It seems we treat session cluster quite different from job cluster,
but cluster client can submit a job, which
overlaps a bit with Executor.

For (2) A bit more details about PerJobExecutor implementation,

>From the content of FLIP-73 it doesn't describe how PerJobExecutor would be
although it is spoken a bit in
the design document[3]. In FLIP-74 thread I forward previous insights in
our community which towards two
attributes of JobCluster

I). Running Flink job by invoke user main method and execute throughout,
instead of create JobGraph from main-class.
II). Run the client inside the cluster.

Does PerJobExecutor fit this requirement? Anyway, it would be helpful we
describe the abstraction of Executor
in the FLIP, at least the different between PerJobExecutor and
SessionExecutor is essential.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/x/thread.html/240582148eda905a772d59b2424cb38fa16ab993647824d178cacb02@%3Cdev.flink.apache.org%3E
[3]
https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?ts=5d8cbe34#heading=h.qq4wc2suukg


Kostas Kloudas <[email protected]> 于2019年9月25日周三 下午8:27写道：

> Hi,
>
> @Aljoscha, I believe that it is better to be done like this so that we
> do not step on each-other's feet. If the executor already "knew" about
> the JobClient, then we should also know about how we expect the
> JobExecutionResult is retrieved (which is part of FLIP-74). I think it
> is nice to have each discussion self-contained.
>
> Cheers,
> Kostas
>
> On Wed, Sep 25, 2019 at 2:13 PM Aljoscha Krettek <[email protected]>
> wrote:
> >
> > Hi,
> >
> > I’m fine with either signature for the new execute() method but I think
> we should focus on the executor discovery and executor configuration part
> in this FLIP while FLIP-74 is about the evolution of the method signature
> to return a future.
> >
> > I understand that it’s a bit weird, that this FLIP introduces a new
> interface only to be changed within the same Flink release in a follow-up
> FLIP. But I think we can still do it. What do you think?
> >
> > Best,
> > Aljoscha
> >
> > > On 25. Sep 2019, at 10:11, Kostas Kloudas <[email protected]> wrote:
> > >
> > > Hi Thomas and Zili,
> > >
> > > As you both said the Executor is a new addition so there are no
> > > compatibility concerns.
> > > Backwards compatibility comes into play on the
> > > (Stream)ExecutionEnvironment#execute().
> > >
> > > This method has to stay and keep having the same (blocking) semantics,
> > > but we can
> > > add a new one, sth along the lines of executeAsync() that will return
> > > the JobClient and
> > > will allow the caller to interact with the job.
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Wed, Sep 25, 2019 at 2:44 AM Zili Chen <[email protected]>
> wrote:
> > >>
> > >>> Since Exceutor is a new interface, why is backward compatibility a
> concern?
> > >>
> > >> For backward compatibility, it is on
> (Stream)ExecutionEnvironment#execute.
> > >> You're right that we don't stick to blocking to return a
> JobExecutionResult in
> > >> Executor aspect but implementing env.execute with a unique
> > >>
> > >> Executor#execute(or with suffix Async): CompletableFuture<JobClient>
> > >>
> > >> what do you think @Kostas Kloudas?
> > >>
> > >>> I could see that become an issue later when replacing Executor
> execute with
> > >>> executeAsync. Or are both targeted for 1.10?
> > >>
> > >> IIUC both Executors and JobClient are targeted for 1.10.
> > >>
> > >>
> > >> Thomas Weise <[email protected]> 于2019年9月25日周三 上午2:39写道：
> > >>>
> > >>> Since Exceutor is a new interface, why is backward compatibility a
> concern?
> > >>>
> > >>> I could see that become an issue later when replacing Executor
> execute with
> > >>> executeAsync. Or are both targeted for 1.10?
> > >>>
> > >>>
> > >>> On Tue, Sep 24, 2019 at 10:24 AM Zili Chen <[email protected]>
> wrote:
> > >>>
> > >>>> Hi Thomas,
> > >>>>
> > >>>>> Should the new Executor execute method be defined as asynchronous?
> It
> > >>>> could
> > >>>>> return a job handle to interact with the job and the legacy
> environments
> > >>>>> can still block to retain their semantics.
> > >>>>
> > >>>> During our discussion there will be a method
> > >>>>
> > >>>> executeAsync(...): CompletableFuture<JobClient>
> > >>>>
> > >>>> where JobClient can be regarded as job handle in your context.
> > >>>>
> > >>>> I think we remain
> > >>>>
> > >>>> execute(...): JobExecutionResult
> > >>>>
> > >>>> just for backward compatibility because this effort towards 1.10
> which is
> > >>>> not a
> > >>>> major version bump.
> > >>>>
> > >>>> BTW, I am drafting details of JobClient(as FLIP-74). Will start a
> separated
> > >>>> discussion
> > >>>> thread on that interface as soon as I finish an early version.
> > >>>>
> > >>>> Best,
> > >>>> tison.
> > >>>>
> > >>>>
> > >>>> Thomas Weise <[email protected]> 于2019年9月25日周三 上午1:17写道：
> > >>>>
> > >>>>> Thanks for the proposal. These changes will make it significantly
> easier
> > >>>> to
> > >>>>> programmatically use Flink in downstream frameworks.
> > >>>>>
> > >>>>> Should the new Executor execute method be defined as asynchronous?
> It
> > >>>> could
> > >>>>> return a job handle to interact with the job and the legacy
> environments
> > >>>>> can still block to retain their semantics.
> > >>>>>
> > >>>>> (The blocking execution has also made things more difficult in
> Beam, we
> > >>>>> could simply switch to use Executor directly.)
> > >>>>>
> > >>>>> Thomas
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Sep 24, 2019 at 6:48 AM Kostas Kloudas <
> [email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> In the context of the discussion about introducing the Job Client
> API
> > >>>>> [1],
> > >>>>>> there was a side-discussion about refactoring the way users
> submit jobs
> > >>>>> in
> > >>>>>> Flink. There were many different interesting ideas on the topic
> and 3
> > >>>>>> design documents that were trying to tackle both the issue about
> code
> > >>>>>> submission and the Job Client API.
> > >>>>>>
> > >>>>>> This discussion thread aims at the job submission part and
> proposes the
> > >>>>>> approach of introducing the Executor abstraction which will
> abstract
> > >>>> the
> > >>>>>> job submission logic from the Environments and will make it API
> > >>>> agnostic.
> > >>>>>>
> > >>>>>> The FLIP can be found at [2].
> > >>>>>>
> > >>>>>> Please keep the discussion here, in the mailing list.
> > >>>>>>
> > >>>>>> Looking forward to your opinions,
> > >>>>>> Kostas
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> > >>>>>> [2]
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
> > >>>>>>
> > >>>>>
> > >>>>
> >
>

Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

Reply via email to