[jira] [Commented] (FLINK-28531) Shutdown cluster after history server archive finished

Aitozi (Jira) Tue, 12 Jul 2022 20:03:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566123#comment-17566123
 ]


Aitozi commented on FLINK-28531:
--------------------------------

I propose to fix this in two way:

First, in the Dispatcher, we also add the archive future to the 
jobTerminationFuture to let it be finished when shutdown.
Second, avoid to delete the master pod in the deregisterApp, and delete the 
cluster until the ClusterEntrypoint terminationFuture have finished.



> Shutdown cluster after history server archive finished
> ------------------------------------------------------
>
>                 Key: FLINK-28531
>                 URL: https://issues.apache.org/jira/browse/FLINK-28531
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>            Reporter: Aitozi
>            Priority: Major
>
> I met a problem that the job cluster may be shutdown with history server 
> archive file upload not finished.
> After some research, It's may be caused by two reason.
> First, the {{HistoryServerArchivist#archiveExecutionGraph}} is not wait to 
> complete 
> Second, the deregisterApp in the 
> {{KubernetesResourceManagerDriver#deregisterApplication}} will directly 
> remove the deployment. So in the shutdown flow in ClusterEntrypoint, it will 
> first trigger the delete deployment, it will cause the master pod deleted 
> with some operation/future can not finished



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28531) Shutdown cluster after history server archive finished

Reply via email to