[ 
https://issues.apache.org/jira/browse/FLINK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mika Naylor updated FLINK-26772:
--------------------------------
    Priority: Critical  (was: Blocker)

> HA Application Mode does not retry resource cleanup
> ---------------------------------------------------
>
>                 Key: FLINK-26772
>                 URL: https://issues.apache.org/jira/browse/FLINK-26772
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Mika Naylor
>            Priority: Critical
>             Fix For: 1.15.0
>
>         Attachments: testcluster-599f4d476b-bghw5_log.txt
>
>
> I set up a scenario in which a k8s native cluster running in Application Mode 
> used an s3 bucket for it's high availability storage directory, with the 
> hadoop plugin. The credentials the cluster used gives it permission to write 
> to the bucket, but not delete, so cleaning up the blob/jobgraph will fail.
> I expected that when trying to clean up the HA resources, it would attempt to 
> retry the cleanup. I even configured this explicitly:
> {{cleanup-strategy: fixed-delay}}
> {{cleanup-strategy.fixed-delay.attempts: 100}}
> {{cleanup-strategy.fixed-delay.delay: 10 s}}
> However, the behaviour I observed is that the blob and jobgraph cleanup is 
> only attempted once. After this failure, I observe in the logs that:
> {{2022-03-21 09:34:40,634 INFO 
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap 
> [] - Application completed SUCCESSFULLY}}
> {{2022-03-21 09:34:40,635 INFO 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting 
> KubernetesApplicationClusterEntrypoint down with application status 
> SUCCEEDED. Diagnostics null.}}
> After which, the cluster recieves a SIGTERM an exits.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to