[ https://issues.apache.org/jira/browse/FLINK-35833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dylan Meissner updated FLINK-35833: ----------------------------------- Description: FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but broke the default behavior of local jar when running application on non-writable filesystems. ArtifactFetchManager now always attempts to create an artifact directory, even when jar is using "local" protocol. Running application on non-writable filesystem is a common scenario in environments when jar is published with the Docker container image. A local jar has no need to be fetched to an intermediate directory, since it's already available on the local filesytem. The LocalArtifactFetcher does not write to the filesystem. However, the ArtifactFetchManager always attempts to create a directory before fetching, regardless of which fetcher would do the work. On non-writable filesystem and environments lacking permissions, the outcome is a runtime exception: {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: Failed}} {{to create parent(s) for given base dir:}} {{/opt/flink/artifacts/<namesapce>/<job name>}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85) [flink-dist-1.19.1.jar:1.19.1]}} {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create parent(s) for given base dir: /opt/flink/artifacts/app07772/sample-app-flink-1-19}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} {{Caused by: java.io.IOException: Cannot create directory '/opt/flink/artifacts/<namespace>'.}} {{ at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} A workaround is to always specify a location using configuration that allows the process to create directories e.g., user.artifacts.base-dir: /tmp/foo. A solution proposal is to enable each fetcher to decide whether to create the intermediate directory or fail. was: FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but broke the default behavior of local jar fetching when running application on non-writable filesystems. Running application on non-writable filesystem is a common scenario in environments when jar is published with the Docker container image. In this case, jar URI is usually specified as value like local://opt/flink/usrlib/my-app.jar. A local jar has no need to be fetched to an intermediate directory, since it's already available on the local filesytem. However, the ArtifactFetchManager always attempts to create a directory before fetching, regardless of which fetcher would do the work. On non-writable filesystem, the outcome is a runtime exception: {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: Failed}} {{to create parent(s) for given base dir:}} {{/opt/flink/artifacts/<namesapce>/<job name>}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85) [flink-dist-1.19.1.jar:1.19.1]}} {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create parent(s) for given base dir: /opt/flink/artifacts/app07772/sample-app-flink-1-19}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} {{Caused by: java.io.IOException: Cannot create directory '/opt/flink/artifacts/<namespace>'.}} {{ at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} A workaround is to always specify a location using configuration that allows the process to create directories e.g., user.artifacts.base-dir: /tmp/foo. A solution proposal is to enable each fetcher to decide whether to create the intermediate directory or fail. Summary: ArtifactFetchManager always creates artifact dir (was: ArtifactFetchManager always requires writable filesystem) > ArtifactFetchManager always creates artifact dir > ------------------------------------------------ > > Key: FLINK-35833 > URL: https://issues.apache.org/jira/browse/FLINK-35833 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.19.0, 1.19.1 > Reporter: Dylan Meissner > Priority: Critical > > FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but > broke the default behavior of local jar when running application on > non-writable filesystems. ArtifactFetchManager now always attempts to create > an artifact directory, even when jar is using "local" protocol. > Running application on non-writable filesystem is a common scenario in > environments when jar is published with the Docker container image. > A local jar has no need to be fetched to an intermediate directory, since > it's already available on the local filesytem. The LocalArtifactFetcher does > not write to the filesystem. However, the ArtifactFetchManager always > attempts to create a directory before fetching, regardless of which fetcher > would do the work. On non-writable filesystem and environments lacking > permissions, the outcome is a runtime exception: > {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: > Failed}} > {{to create parent(s) for given base dir:}} > {{/opt/flink/artifacts/<namesapce>/<job name>}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85) > [flink-dist-1.19.1.jar:1.19.1]}} > {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create > parent(s) for given base dir: > /opt/flink/artifacts/app07772/sample-app-flink-1-19}} > {{ at > org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ ... 5 more}} > {{Caused by: java.io.IOException: Cannot create directory > '/opt/flink/artifacts/<namespace>'.}} > {{ at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ ... 5 more}} > A workaround is to always specify a location using configuration that allows > the process to create directories e.g., user.artifacts.base-dir: /tmp/foo. > A solution proposal is to enable each fetcher to decide whether to create the > intermediate directory or fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)