[ https://issues.apache.org/jira/browse/FLINK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mika Naylor updated FLINK-26772: -------------------------------- Attachment: testcluster-599f4d476b-bghw5_log.txt > Kubernetes Native in HA Application Mode does not retry resource cleanup > ------------------------------------------------------------------------ > > Key: FLINK-26772 > URL: https://issues.apache.org/jira/browse/FLINK-26772 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.15.0 > Reporter: Mika Naylor > Priority: Blocker > Attachments: testcluster-599f4d476b-bghw5_log.txt > > > I set up a scenario in which a k8s native cluster running in Application Mode > used an s3 bucket for it's high availability storage directory, with the > hadoop plugin. The credentials the cluster used gives it permission to write > to the bucket, but not delete, so cleaning up the blob/jobgraph will fail. > I expected that when trying to clean up the HA resources, it would attempt to > retry the cleanup. I even configured this explicitly: > {{cleanup-strategy: fixed-delay}} > {{cleanup-strategy.fixed-delay.attempts: 100}} > {{cleanup-strategy.fixed-delay.delay: 10 s}} > However, the behaviour I observed is that the blob and jobgraph cleanup is > only attempted once. After this failure, I observe in the logs that: > {{2022-03-21 09:34:40,634 INFO > org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap > [] - Application completed SUCCESSFULLY}} > {{2022-03-21 09:34:40,635 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting > KubernetesApplicationClusterEntrypoint down with application status > SUCCEEDED. Diagnostics null.}} > After which, the cluster recieves a SIGTERM an exits. -- This message was sent by Atlassian Jira (v8.20.1#820001)