[ https://issues.apache.org/jira/browse/FLINK-35358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846914#comment-17846914 ]
Rasmus Thygesen commented on FLINK-35358: ----------------------------------------- Yep I saw it, code looks good to me. In your test, is it correct that you are setting the usrlib to the parent directory so it looks from /opt/flink instead of /opt/flink/usrlib and therefore it needs to search recursively to also find the artifacts inside /opt/flink/usrlib? > Breaking change when loading artifacts > -------------------------------------- > > Key: FLINK-35358 > URL: https://issues.apache.org/jira/browse/FLINK-35358 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, flink-docker > Affects Versions: 1.19.0 > Reporter: Rasmus Thygesen > Priority: Not a Priority > Labels: pull-request-available > Fix For: 1.19.1 > > > We have been using the following code snippet in our Dockerfiles for running > a Flink job in application mode > > {code:java} > FROM flink:1.18.1-scala_2.12-java17 > COPY --from=build /app/target/my-job*.jar > /opt/flink/usrlib/artifacts/my-job.jar > USER flink {code} > > Which has been working since at least around Flink 1.14, but the 1.19 update > has broken our Dockerfiles. The fix is to put the jar file a step further out > so the code snippet becomes > > {code:java} > FROM flink:1.18.1-scala_2.12-java17 > COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar > USER flink {code} > > We have not spent too much time looking into what the cause is, but we get > the stack trace > > {code:java} > myjob-jobmanager-1 | org.apache.flink.util.FlinkException: Could not load > the provided entrypoint class. > myjob-jobmanager-1 | at > org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89) > [flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | Caused by: > org.apache.flink.client.program.ProgramInvocationException: The program's > entry point class 'my.company.job.MyJob' was not found in the jar file. > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | ... 4 more > myjob-jobmanager-1 | Caused by: java.lang.ClassNotFoundException: > my.company.job.MyJob > myjob-jobmanager-1 | at java.net.URLClassLoader.findClass(Unknown > Source) ~[?:?] > myjob-jobmanager-1 | at java.lang.ClassLoader.loadClass(Unknown Source) > ~[?:?] > myjob-jobmanager-1 | at > org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at java.lang.ClassLoader.loadClass(Unknown Source) > ~[?:?] > myjob-jobmanager-1 | at > org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:197) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at java.lang.Class.forName0(Native Method) ~[?:?] > myjob-jobmanager-1 | at java.lang.Class.forName(Unknown Source) ~[?:?] > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:479) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | at > org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228) > ~[flink-dist-1.19.0.jar:1.19.0] > myjob-jobmanager-1 | ... 4 more{code} > > I have changed some text in the stack trace to keep it anonymous so it is > possible there is a typo but that is not the issue. As you can see, the stack > trace leads to PackagedProgram and DefaultPackagedProgramRetriever to which > the only commits after Flink 1.18 are [PackagedProgram > commit|https://github.com/apache/flink/commit/d0ce5349fdf1a611518eba20a169c475ee0b46c5] > and [DefaultPackagedProgramRetriever > commit|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af] > and we suspect the culprit is the latter, specifically [this > line|https://github.com/apache/flink/commit/e63aa12252843d0098a56f3091b28d48aff5b5af#diff-11b5162d6745014c68e96303d26c71bdb88bac068c27834dbdbb7c9089ffbe9fL227] > which we think has made the artifact check non-recursive. We assume it is > intended to have your artifacts directly in /opt/flink/usrlib without the > artifacts directory so we are planning on changing that for our Dockerfiles > anyway, but it is still a breaking change so we wanted to make an issue on it > first. -- This message was sent by Atlassian Jira (v8.20.10#820010)