[ 
https://issues.apache.org/jira/browse/FLINK-35833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Meissner updated FLINK-35833:
-----------------------------------
    Description: 
FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but 
broke the default behavior of local jar fetching when running application on 
non-writable filesystems.

Running application on non-writable filesystem is a common scenario in 
environments when jar is published with the Docker container image. In this 
case, jar URI is usually specified as value like 
local://opt/flink/usrlib/my-app.jar.

A local jar has no need to be fetched to an intermediate directory, since it's 
already available on the local filesytem. However, the ArtifactFetchManager 
always attempts to create a directory before fetching, regardless of which 
fetcher would do the work. On non-writable filesystem, the outcome is a runtime 
exception:

{{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: 
Failed}}
{{to create parent(s) for given base dir:}}
{{/opt/flink/artifacts/<namesapce>/<job name>}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85)
 [flink-dist-1.19.1.jar:1.19.1]}}
{{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create 
parent(s) for given base dir: 
/opt/flink/artifacts/app07772/sample-app-flink-1-19}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}
{{Caused by: java.io.IOException: Cannot create directory 
'/opt/flink/artifacts/<namespace>'.}}
{{    at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}

A workaround is to specify a location using configuration that allows the 
process to create directories e.g., user.artifacts.base-dir: /tmp/foo.

A solution proposal is to enable each fetcher to decide whether to create the 
intermediate directory or fail.

  was:
FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but 
broke the default behavior of local jar fetching when running application on 
non-writable filesystems.

Running application on non-writable filesystem is a common scenario in 
environments when jar is published with the Docker container image. In this 
case, jar URI is usually specified as value like 
local://opt/flink/usrlib/my-app.jar.

A local jar does not get "fetched", with no need to create an intermediate 
directory to copy fetched artifact to. However, the ArtifactFetchManager always 
attempts to create a directory before fetching, regardless of which fetcher 
would do the work. On non-writable filesystem, the outcome is a runtime 
exception:

{{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: 
Failed}}
{{to create parent(s) for given base dir:}}
{{/opt/flink/artifacts/<namesapce>/<job name>}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85)
 [flink-dist-1.19.1.jar:1.19.1]}}
{{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create 
parent(s) for given base dir: 
/opt/flink/artifacts/app07772/sample-app-flink-1-19}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}
{{Caused by: java.io.IOException: Cannot create directory 
'/opt/flink/artifacts/<namespace>'.}}
{{    at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) 
~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    at 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
 ~[flink-dist-1.19.1.jar:1.19.1]}}
{{    ... 5 more}}

A workaround is to specify a location using configuration that allows the 
process to create directories e.g., user.artifacts.base-dir: /tmp/foo.

A solution proposal is to enable each fetcher to decide whether to create the 
intermediate directory or fail.


> ArtifactFetchManager always requires writable filesystem
> --------------------------------------------------------
>
>                 Key: FLINK-35833
>                 URL: https://issues.apache.org/jira/browse/FLINK-35833
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.19.0, 1.19.1
>            Reporter: Dylan Meissner
>            Priority: Critical
>
> FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but 
> broke the default behavior of local jar fetching when running application on 
> non-writable filesystems.
> Running application on non-writable filesystem is a common scenario in 
> environments when jar is published with the Docker container image. In this 
> case, jar URI is usually specified as value like 
> local://opt/flink/usrlib/my-app.jar.
> A local jar has no need to be fetched to an intermediate directory, since 
> it's already available on the local filesytem. However, the 
> ArtifactFetchManager always attempts to create a directory before fetching, 
> regardless of which fetcher would do the work. On non-writable filesystem, 
> the outcome is a runtime exception:
> {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: 
> Failed}}
> {{to create parent(s) for given base dir:}}
> {{/opt/flink/artifacts/<namesapce>/<job name>}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85)
>  [flink-dist-1.19.1.jar:1.19.1]}}
> {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create 
> parent(s) for given base dir: 
> /opt/flink/artifacts/app07772/sample-app-flink-1-19}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    ... 5 more}}
> {{Caused by: java.io.IOException: Cannot create directory 
> '/opt/flink/artifacts/<namespace>'.}}
> {{    at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) 
> ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) 
> ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) 
> ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    at 
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156)
>  ~[flink-dist-1.19.1.jar:1.19.1]}}
> {{    ... 5 more}}
> A workaround is to specify a location using configuration that allows the 
> process to create directories e.g., user.artifacts.base-dir: /tmp/foo.
> A solution proposal is to enable each fetcher to decide whether to create the 
> intermediate directory or fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to