Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

liu ron Wed, 16 Aug 2023 09:31:24 -0700

Hi, Jiangjie

Sorry for late reply. Thank you for such a detailed response. As you say,
there are three behaviours here for users and I agree with you. The goal of
this FLIP is to clarify the behaviour of the client side, which I also
agree with. However, as weihua said, the config execution.attached is not
only for per-job mode, but also for session mode, but the FLIP says that
this is only for per-job mode, and this config will be removed in the
future because the per-job mode has been deprecated. I don't think this is
correct and we should change the description in the corresponding section
of the FLIP. Since execution.attached is used in session mode, there is a
compatibility issue here if we change it directly to
client.attached.after.submission, and I think we should make this clear in
the FLIP.


Best,
Ron

Becket Qin <becket....@gmail.com> 于2023年8月14日周一 20:33写道：

> Hi Ron and Weihua,
>
> Thanks for the feedback.
>
> There seem three user sensible behaviors that we are talking about:
>
> 1. The behavior on the client side, i.e. whether blocking until the job
> finishes or not.
>
> 2. The behavior of the submitted job, whether stop the job execution if the
> client is detached from the Flink cluster, i.e. whether bind the lifecycle
> of the job with the connection status of the attached client. For example,
> one might want to keep a batch job running until finish even after the
> client connection is lost. But it makes sense to stop the job upon client
> connection lost if the job invokes collect() on a streaming job.
>
> 3. The behavior of the Flink cluster (JM and TMs), whether shutdown the
> Flink cluster if the client is detached from the Flink cluster, i.e.
> whether bind the cluster lifecycle with the job lifecycle. For dedicated
> clusters (application cluster or dedicated session clusters), the lifecycle
> of the cluster should be bound with the job lifecycle. But for shared
> session clusters, the lifecycle of the Flink cluster should be independent
> of the jobs running in it.
>
> As we can see, these three behaviors are sort of independent, the current
> configurations fail to support all the combination of wanted behaviors.
> Ideally there should be three separate configurations, for example:
> - client.attached.after.submission and client.heartbeat.timeout control the
> behavior on the client side.
> - jobmanager.cancel-on-attached-client-exit controls the behavior of the
> job when an attached client lost connection. The client heartbeat timeout
> and attach-ness will be also passed to the JM upon job submission.
> - cluster.shutdown-on-first-job-finishes *(*or
> jobmanager.shutdown-cluster-after-job-finishes) controls the cluster
> behavior after the job finishes normally / abnormally. This is a cluster
> level setting instead of a job level setting. Therefore it can only be set
> when launching the cluster.
>
> The current code sort of combines config 2 and 3 into
> execution.shutdown-on-attach-exit.
> This assumes the the life cycle of the cluster is the same as the job when
> the client is attached. This FLIP does not intend to change that. but using
> the execution.attached config for the client behavior control looks
> misleading. So this FLIP proposes to replace it with a more intuitive
> config of client.attached.after.submission. This makes it clear that it is
> a configuration controlling the client side behavior, instead of the
> execution of the job.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
>
> On Thu, Aug 10, 2023 at 10:34 PM Weihua Hu <huweihua....@gmail.com> wrote:
>
> > Hi Allison
> >
> > Thanks for driving this FLIP. It's a valuable feature for batch jobs.
> > This helps keep "Drop Per-Job Mode [1]" going.
> >
> > +1 for this proposal.
> >
> > However, it seems that the change in this FLIP is not detailed enough.
> > I have a few questions.
> >
> > 1. The config 'execution.attached' is not only used in per-job mode,
> > but also in session mode to shutdown the cluster. IMHO, it's better to
> > keep this option name.
> >
> > 2. This FLIP only mentions YARN mode. I believe this feature should
> > work in both YARN and Kubernetes mode.
> >
> > 3. Within the attach mode, we support two features:
> > execution.shutdown-on-attached-exit
> > and client.heartbeat.timeout. These should also be taken into account.
> >
> > 4. The Application Mode will shut down once the job has been completed.
> > So, if we use the flink client to poll job status via REST API for attach
> > mode,
> > there is a chance that the client will not be able to retrieve the job
> > finish status.
> > Perhaps FLINK-24113[3] will help with this.
> >
> >
> > [1]https://issues.apache.org/jira/browse/FLINK-26000
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode
> > [2]https://issues.apache.org/jira/browse/FLINK-24113
> >
> > Best,
> > Weihua
> >
> >
> > On Thu, Aug 10, 2023 at 10:47 AM liu ron <ron9....@gmail.com> wrote:
> >
> > > Hi, Allison
> > >
> > > Thanks for driving this proposal, it looks cool for batch jobs under
> > > application mode. But after reading your FLIP document and [1], I have
> a
> > > question. Why do you want to rename the execution.attached
> configuration
> > to
> > > client.attached.after.submission and at the same time deprecate
> > > execution.attached? Based on your design, I understand the role of
> these
> > > two options are the same. Introducing a new option would increase the
> > cost
> > > of understanding and use for the user, so why not follow the idea
> > discussed
> > > in FLINK-25495 and make Application mode support attached.execution.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-25495
> > >
> > > Best,
> > > Ron
> > >
> > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月9日周三 02:07写道：
> > >
> > > > This is definitely a useful feature especially for the flink batch
> > > > execution workloads using flow orchestrators like Airflow, Azkaban,
> > Oozie
> > > > etc. Thanks for reviving this issue and starting a FLIP.
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >
> > > >
> > > > On Mon, Aug 7, 2023 at 4:09 PM Allison Chang
> > > <alch...@linkedin.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am opening this thread to discuss this proposal to support
> attached
> > > > > execution on Flink Application Completion for Batch Jobs. The link
> to
> > > the
> > > > > FLIP proposal is here:
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-323*3A*Support*Attached*Execution*on*Flink*Application*Completion*for*Batch*Jobs__;JSsrKysrKysrKys!!IKRxdwAv5BmarQ!friFO6bJub5FKSLhPIzA6kv-7uffv-zXlv9ZLMKqj_xMcmZl62HhsgvwDXSCS5hfSeyHZgoAVSFg3fk7ChaAFNKi$
> > > > >
> > > > > This FLIP proposes adding back attached execution for Application
> > Mode.
> > > > In
> > > > > the past attached execution was supported for the per-job mode,
> which
> > > > will
> > > > > be deprecated and we want to include this feature back into
> > Application
> > > > > mode.
> > > > >
> > > > > Please reply to this email thread and share your thoughts/opinions.
> > > > >
> > > > > Thank you!
> > > > >
> > > > > Allison Chang
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

Reply via email to