Arseniy Tashoyan created FLINK-23972: ----------------------------------------
Summary: NoClassDefFoundError: Jars from yarn.ship-files are missing in the classpath of the client Key: FLINK-23972 URL: https://issues.apache.org/jira/browse/FLINK-23972 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.13.2, 1.12.4 Environment: Flink 1.12.4 or 1.13.2 YARN Running the application as follows: {code:bash} # Find additional jars under $lib_dir # Join jars with semicolon jars=$(mk_string \; $(find "$lib_dir" -type f -name \*.jar)) # Provide additional jars via yarn.ship-files /opt/flink/bin/flink \ run \ --target yarn-per-job \ --detached \ --class com.example.DummyMain \ -Dyarn.ship-files="$jars" \ dummy-app-1.0.jar {code} Reporter: Arseniy Tashoyan Flink provides a way to inject additional jars into the classpath of the Job Manager and Task Manager: the configuration option [yarn.ship-files|https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/#yarn-ship-files]. However, an applications fails to submit to YARN if it uses a class from the additional jars. The execution graph is built inside the client before the application goes to the YARN cluster. Meanwhile, the client part (the command-line tool *flink*) misses the additional jars in its classpath. *Example with Scalaz library.* {code:scala} package com.example import ALenses.aLens import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, createTypeInformation} import scalaz.Lens import scalaz.Lens.lensu object DummyMain { def main(args: Array[String]): Unit = { val env = StreamExecutionEnvironment.getExecutionEnvironment /* Way 1: Using Scalaz inside the client code. */ val inputData = Seq(1, 2, 3).map { i => aLens.set(A(0), i) } /* Way 2: Not using Scalaz inside the client code. */ // val inputData = Seq(1, 2, 3).map(A) val input = env.fromCollection(inputData) val output = input .map { a => /* Using Scalaz inside Task Manager. */ s"DUMMY: ${aLens.get(a)}" } output.print() env.execute() () } } case class A( i: Int ) object ALenses { val aLens: Lens[A, Int] = lensu[A, Int]( set = (a, i1) => a.copy(i = i1), get = _.i ) } {code} When using "Way 1: Using Scalaz inside the client code", the application fails immediately - inside the client, before getting to YARN: {code:none} java.lang.NoClassDefFoundError: scalaz/Lens$ at com.example.ALenses$.<init>(DummyMain.scala:38) at com.example.ALenses$.<clinit>(DummyMain.scala) at com.example.DummyMain$.$anonfun$main$1(DummyMain.scala:17) at com.example.DummyMain$.$anonfun$main$1$adapted(DummyMain.scala:16) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.immutable.List.foreach(List.scala:388) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.immutable.List.map(List.scala:294) at com.example.DummyMain$.main(DummyMain.scala:16) at com.example.DummyMain.main(DummyMain.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ... {code} Remarkable: if using "Way 2: Not using Scalaz inside the client code", the application successfully runs on YARN. The subsequent usage of Scalaz inside Task Managers does not cause a failure. This behavior is a blocker - the setting yarn.ship-files is actually non-working. Flink provides no way to customize the classpath of the *flink* client. The function *constructFlinkClassPath()* in *FLINK_HOME/**bin/config.sh* does not provide any means for customization. Instead this function could prepend the resulting classpath with a variable *CUSTOM_CLIENT_CLASSPATH* settable by user. There are 2 alternate ways to circumvent this problem - both are not suitable. 1. Pack everything in a fat-jar. Fat-jars have numerous disadvantages, like huge file size, many multiplied classes across different applications, untraceable dependencies, classpath conflicts. 2. Put application-level jars into *FLINK_HOME/lib*. We have the same Flink installation for many different applications belonging to different teams. We do not want to pollute the Flink installation with user's jars. *FLINK_HOME* is located on a filesystem not writable for ordinary users. Discussion on StackOverflow: [Flink on yarn: how to add external jars class path in local Client|https://stackoverflow.com/questions/49423861/flink-on-yarn-how-to-add-external-jars-class-path-in-local-client]. -- This message was sent by Atlassian Jira (v8.3.4#803005)