Hi, Jiangjie Sorry for late reply. Thank you for such a detailed response. As you say, there are three behaviours here for users and I agree with you. The goal of this FLIP is to clarify the behaviour of the client side, which I also agree with. However, as weihua said, the config execution.attached is not only for per-job mode, but also for session mode, but the FLIP says that this is only for per-job mode, and this config will be removed in the future because the per-job mode has been deprecated. I don't think this is correct and we should change the description in the corresponding section of the FLIP. Since execution.attached is used in session mode, there is a compatibility issue here if we change it directly to client.attached.after.submission, and I think we should make this clear in the FLIP.
Best, Ron Becket Qin <becket....@gmail.com> 于2023年8月14日周一 20:33写道: > Hi Ron and Weihua, > > Thanks for the feedback. > > There seem three user sensible behaviors that we are talking about: > > 1. The behavior on the client side, i.e. whether blocking until the job > finishes or not. > > 2. The behavior of the submitted job, whether stop the job execution if the > client is detached from the Flink cluster, i.e. whether bind the lifecycle > of the job with the connection status of the attached client. For example, > one might want to keep a batch job running until finish even after the > client connection is lost. But it makes sense to stop the job upon client > connection lost if the job invokes collect() on a streaming job. > > 3. The behavior of the Flink cluster (JM and TMs), whether shutdown the > Flink cluster if the client is detached from the Flink cluster, i.e. > whether bind the cluster lifecycle with the job lifecycle. For dedicated > clusters (application cluster or dedicated session clusters), the lifecycle > of the cluster should be bound with the job lifecycle. But for shared > session clusters, the lifecycle of the Flink cluster should be independent > of the jobs running in it. > > As we can see, these three behaviors are sort of independent, the current > configurations fail to support all the combination of wanted behaviors. > Ideally there should be three separate configurations, for example: > - client.attached.after.submission and client.heartbeat.timeout control the > behavior on the client side. > - jobmanager.cancel-on-attached-client-exit controls the behavior of the > job when an attached client lost connection. The client heartbeat timeout > and attach-ness will be also passed to the JM upon job submission. > - cluster.shutdown-on-first-job-finishes *(*or > jobmanager.shutdown-cluster-after-job-finishes) controls the cluster > behavior after the job finishes normally / abnormally. This is a cluster > level setting instead of a job level setting. Therefore it can only be set > when launching the cluster. > > The current code sort of combines config 2 and 3 into > execution.shutdown-on-attach-exit. > This assumes the the life cycle of the cluster is the same as the job when > the client is attached. This FLIP does not intend to change that. but using > the execution.attached config for the client behavior control looks > misleading. So this FLIP proposes to replace it with a more intuitive > config of client.attached.after.submission. This makes it clear that it is > a configuration controlling the client side behavior, instead of the > execution of the job. > > Thanks, > > Jiangjie (Becket) Qin > > > > > > On Thu, Aug 10, 2023 at 10:34 PM Weihua Hu <huweihua....@gmail.com> wrote: > > > Hi Allison > > > > Thanks for driving this FLIP. It's a valuable feature for batch jobs. > > This helps keep "Drop Per-Job Mode [1]" going. > > > > +1 for this proposal. > > > > However, it seems that the change in this FLIP is not detailed enough. > > I have a few questions. > > > > 1. The config 'execution.attached' is not only used in per-job mode, > > but also in session mode to shutdown the cluster. IMHO, it's better to > > keep this option name. > > > > 2. This FLIP only mentions YARN mode. I believe this feature should > > work in both YARN and Kubernetes mode. > > > > 3. Within the attach mode, we support two features: > > execution.shutdown-on-attached-exit > > and client.heartbeat.timeout. These should also be taken into account. > > > > 4. The Application Mode will shut down once the job has been completed. > > So, if we use the flink client to poll job status via REST API for attach > > mode, > > there is a chance that the client will not be able to retrieve the job > > finish status. > > Perhaps FLINK-24113[3] will help with this. > > > > > > [1]https://issues.apache.org/jira/browse/FLINK-26000 > > [2] > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode > > [2]https://issues.apache.org/jira/browse/FLINK-24113 > > > > Best, > > Weihua > > > > > > On Thu, Aug 10, 2023 at 10:47 AM liu ron <ron9....@gmail.com> wrote: > > > > > Hi, Allison > > > > > > Thanks for driving this proposal, it looks cool for batch jobs under > > > application mode. But after reading your FLIP document and [1], I have > a > > > question. Why do you want to rename the execution.attached > configuration > > to > > > client.attached.after.submission and at the same time deprecate > > > execution.attached? Based on your design, I understand the role of > these > > > two options are the same. Introducing a new option would increase the > > cost > > > of understanding and use for the user, so why not follow the idea > > discussed > > > in FLINK-25495 and make Application mode support attached.execution. > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-25495 > > > > > > Best, > > > Ron > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月9日周三 02:07写道: > > > > > > > This is definitely a useful feature especially for the flink batch > > > > execution workloads using flow orchestrators like Airflow, Azkaban, > > Oozie > > > > etc. Thanks for reviving this issue and starting a FLIP. > > > > > > > > Regards > > > > Venkata krishnan > > > > > > > > > > > > On Mon, Aug 7, 2023 at 4:09 PM Allison Chang > > > <alch...@linkedin.com.invalid > > > > > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I am opening this thread to discuss this proposal to support > attached > > > > > execution on Flink Application Completion for Batch Jobs. The link > to > > > the > > > > > FLIP proposal is here: > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-323*3A*Support*Attached*Execution*on*Flink*Application*Completion*for*Batch*Jobs__;JSsrKysrKysrKys!!IKRxdwAv5BmarQ!friFO6bJub5FKSLhPIzA6kv-7uffv-zXlv9ZLMKqj_xMcmZl62HhsgvwDXSCS5hfSeyHZgoAVSFg3fk7ChaAFNKi$ > > > > > > > > > > This FLIP proposes adding back attached execution for Application > > Mode. > > > > In > > > > > the past attached execution was supported for the per-job mode, > which > > > > will > > > > > be deprecated and we want to include this feature back into > > Application > > > > > mode. > > > > > > > > > > Please reply to this email thread and share your thoughts/opinions. > > > > > > > > > > Thank you! > > > > > > > > > > Allison Chang > > > > > > > > > > > > > > >