[ 
https://issues.apache.org/jira/browse/FLINK-35833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Meissner updated FLINK-35833:
-----------------------------------
    Description: 
FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but 
broke the default behavior of local jar when running application on 
non-writable filesystems. ArtifactFetchManager now always attempts to create an 
artifact directory, even when jar is using "local" protocol.

Running application on non-writable filesystem is a common scenario in 
environments when jar is published with the Docker container image.

A local jar has no need to be fetched to an intermediate directory, since it's 
already available on the local filesytem. The LocalArtifactFetcher does not 
write to the filesystem. However, the ArtifactFetchManager always attempts to 
create a directory before fetching, regardless of which fetcher would do the 
work. On non-writable filesystem and environments lacking permissions, the 
outcome is a runtime exception:

{{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: 
Failed}}
{{to create parent(s) for given base dir:}}
{{/opt/flink/artifacts/<namesapce>/<job name>}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85)
 [flink-dist-1.19.1.jar:1.19.1]}}
{{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create 
parent(s) for given base dir: 
/opt/flink/artifacts/app07772/sample-app-flink-1-19}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}
{{Caused by: java.io.IOException: Cannot create directory 
'/opt/flink/artifacts/<namespace>'.}}
{{    at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}

A workaround is to always specify a location using configuration that allows 
the process to create directories e.g., user.artifacts.base-dir: /tmp/foo.

A solution proposal is to enable each fetcher to decide whether to create the 
intermediate directory or fail.

  was:
FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but 
broke the default behavior of local jar fetching when running application on 
non-writable filesystems.

Running application on non-writable filesystem is a common scenario in 
environments when jar is published with the Docker container image. In this 
case, jar URI is usually specified as value like 
local://opt/flink/usrlib/my-app.jar.

A local jar has no need to be fetched to an intermediate directory, since it's 
already available on the local filesytem. However, the ArtifactFetchManager 
always attempts to create a directory before fetching, regardless of which 
fetcher would do the work. On non-writable filesystem, the outcome is a runtime 
exception:

{{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: 
Failed}}
{{to create parent(s) for given base dir:}}
{{/opt/flink/artifacts/<namesapce>/<job name>}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85)
 [flink-dist-1.19.1.jar:1.19.1]}}
{{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create 
parent(s) for given base dir: 
/opt/flink/artifacts/app07772/sample-app-flink-1-19}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}
{{Caused by: java.io.IOException: Cannot create directory 
'/opt/flink/artifacts/<namespace>'.}}
{{    at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}

A workaround is to always specify a location using configuration that allows 
the process to create directories e.g., user.artifacts.base-dir: /tmp/foo.

A solution proposal is to enable each fetcher to decide whether to create the 
intermediate directory or fail.

        Summary: ArtifactFetchManager always creates artifact dir  (was: 
ArtifactFetchManager always requires writable filesystem)

> ArtifactFetchManager always creates artifact dir
> ------------------------------------------------
>
>                 Key: FLINK-35833
>                 URL: https://issues.apache.org/jira/browse/FLINK-35833
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.19.0, 1.19.1
>            Reporter: Dylan Meissner
>            Priority: Critical
>
> FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but 
> broke the default behavior of local jar when running application on 
> non-writable filesystems. ArtifactFetchManager now always attempts to create 
> an artifact directory, even when jar is using "local" protocol.
> Running application on non-writable filesystem is a common scenario in 
> environments when jar is published with the Docker container image.
> A local jar has no need to be fetched to an intermediate directory, since 
> it's already available on the local filesytem. The LocalArtifactFetcher does 
> not write to the filesystem. However, the ArtifactFetchManager always 
> attempts to create a directory before fetching, regardless of which fetcher 
> would do the work. On non-writable filesystem and environments lacking 
> permissions, the outcome is a runtime exception:
> {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: 
> Failed}}
> {{to create parent(s) for given base dir:}}
> {{/opt/flink/artifacts/<namesapce>/<job name>}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85)
>  [flink-dist-1.19.1.jar:1.19.1]}}
> {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create 
> parent(s) for given base dir: 
> /opt/flink/artifacts/app07772/sample-app-flink-1-19}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    ... 5 more}}
> {{Caused by: java.io.IOException: Cannot create directory 
> '/opt/flink/artifacts/<namespace>'.}}
> {{    at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) 
> ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) 
> ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) 
> ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    ... 5 more}}
> A workaround is to always specify a location using configuration that allows 
> the process to create directories e.g., user.artifacts.base-dir: /tmp/foo.
> A solution proposal is to enable each fetcher to decide whether to create the 
> intermediate directory or fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to