[ https://issues.apache.org/jira/browse/FLINK-35833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dylan Meissner updated FLINK-35833: ----------------------------------- Description: FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but broke the default behavior of local jar fetching when running application on non-writable filesystems. Running application on non-writable filesystem is a common scenario in environments when jar is published with the Docker container image. In this case, jar URI is usually specified as value like local://opt/flink/usrlib/my-app.jar. A local jar does not get "fetched", with no need to create an intermediate directory to copy fetched artifact to. However, the ArtifactFetchManager always attempts to create a directory before fetching, regardless of which fetcher would do the work. On non-writable filesystem, the outcome is a runtime exception: {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: Failed}} {{to create parent(s) for given base dir:}} {{/opt/flink/artifacts/<namesapce>/<job name>}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85) [flink-dist-1.19.1.jar:1.19.1]}} {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create parent(s) for given base dir: /opt/flink/artifacts/app07772/sample-app-flink-1-19}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} {{Caused by: java.io.IOException: Cannot create directory '/opt/flink/artifacts/<namespace>'.}} {{ at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} A workaround is to specify a location that allows the process to create directories e.g., user.artifacts.base-dir: /tmp/foo. A solution proposal is to enable each fetcher to decide whether to create the intermediate directory or fail. was: FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but broke the default behavior of local jar fetching when running application on non-writable filesystems. Running application on non-writable filesystem is a common scenario in environments when jar is published with the Docker container image. In this case, jar URI is usually specified as value like local://opt/flink/usrlib/my-app.jar. A local jar does not get "fetched", with no need to create an intermediate directory to copy fetched artifact to. However, the ArtifactFetchManager always attempts to create a directory before fetching, regardless of which fetcher would do the work. On non-writable filesystem, the outcome is a runtime exception: {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: Failed}} {{to create parent(s) for given base dir:}} {{/opt/flink/artifacts/<namesapce>/<job name>}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85) [flink-dist-1.19.1.jar:1.19.1]}} {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create parent(s) for given base dir: /opt/flink/artifacts/app07772/sample-app-flink-1-19}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} {{Caused by: java.io.IOException: Cannot create directory '/opt/flink/artifacts/app07772'.}} {{ at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ at org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) ~[flink-dist-1.19.1.jar:1.19.1]}} {{ ... 5 more}} A workaround is to specify a location that allows the process to create directories e.g., user.artifacts.base-dir: /tmp/foo. A solution proposal is to enable each fetcher to decide whether to create the intermediate directory or fail. > ArtifactFetchManager always requires writable filesystem > -------------------------------------------------------- > > Key: FLINK-35833 > URL: https://issues.apache.org/jira/browse/FLINK-35833 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.19.0, 1.19.1 > Reporter: Dylan Meissner > Priority: Critical > > FLINK-28915 added support for remote job jar fetching (HTTPS, S3, etc) but > broke the default behavior of local jar fetching when running application on > non-writable filesystems. > Running application on non-writable filesystem is a common scenario in > environments when jar is published with the Docker container image. In this > case, jar URI is usually specified as value like > local://opt/flink/usrlib/my-app.jar. > A local jar does not get "fetched", with no need to create an intermediate > directory to copy fetched artifact to. However, the ArtifactFetchManager > always attempts to create a directory before fetching, regardless of which > fetcher would do the work. On non-writable filesystem, the outcome is a > runtime exception: > {{java.lang.RuntimeException: org.apache.flink.util.FlinkRuntimeException: > Failed}} > {{to create parent(s) for given base dir:}} > {{/opt/flink/artifacts/<namesapce>/<job name>}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:158) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgramRetriever(KubernetesApplicationClusterEntrypoint.java:129) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.getPackagedProgram(KubernetesApplicationClusterEntrypoint.java:111) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.lambda$main$0(KubernetesApplicationClusterEntrypoint.java:85) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:85) > [flink-dist-1.19.1.jar:1.19.1]}} > {{Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create > parent(s) for given base dir: > /opt/flink/artifacts/app07772/sample-app-flink-1-19}} > {{ at > org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:50) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ ... 5 more}} > {{Caused by: java.io.IOException: Cannot create directory > '/opt/flink/artifacts/<namespace>'.}} > {{ at org.apache.commons.io.FileUtils.mkdirs(FileUtils.java:2289) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at org.apache.commons.io.FileUtils.forceMkdir(FileUtils.java:1376) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.commons.io.FileUtils.forceMkdirParent(FileUtils.java:1394) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.client.program.artifact.ArtifactUtils.createMissingParents(ArtifactUtils.java:46) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.client.program.artifact.ArtifactFetchManager.fetchArtifacts(ArtifactFetchManager.java:123) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ at > org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.fetchArtifacts(KubernetesApplicationClusterEntrypoint.java:156) > ~[flink-dist-1.19.1.jar:1.19.1]}} > {{ ... 5 more}} > A workaround is to specify a location that allows the process to create > directories e.g., user.artifacts.base-dir: /tmp/foo. > A solution proposal is to enable each fetcher to decide whether to create the > intermediate directory or fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)