Lukas Heppe created ZEPPELIN-5666: ------------------------------------- Summary: Spark Additional jars: Does spark.jars override spark.jars.packages? Key: ZEPPELIN-5666 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5666 Project: Zeppelin Issue Type: Bug Affects Versions: 0.10.1 Reporter: Lukas Heppe
Hey, I got the following setup: Spark 3.1.2 Standalone Cluster (1 Master, 2 Worker) Zeppelin 0.10.1 SparkInterpreterSetting: {code:java} SPARK_HOME /opt/spark (points to spark-3.1.2) spark.master spark://my-spark-master-host:7077 spark.submit.deployMode client spark.jars.packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0,eu.europa.ec.joinup.sd-dss:dss-xades:5.9 spark.jars.repositories https://ec.europa.eu/cefdigital/artifact/content/repositories/esignaturedss/{code} If I prepare a cell like {code:java} %spark sc.version {code} I get the correct output (3.1.2). Also cells which compute PI from the spark examples seem to work. Also, if I use the spark cassandra connector, I get the first five rows like expected. {code:java} %spark import com.datastax.spark.connector._ val rdd = sc.cassandraTable("mykeyspace", "mytable") println(rdd.take(5).toList) {code} However, if I try to add a local jar via the spark.jars property as following {code:java} spark.jars file:///absolute/path/to/my/custom/jar {code} the jars provided via spark.jars.packages are not part of the SparkContext. The jar is located at the worker and zeppelin at the same path. If I run {code:java} %spark sc.listJars().foreach(println) {code} without spark.jars set, I get a long list like expected (stuff from datastax + eu repos). However, if I restart the interpreter and provide the spark.jars option, the cell from above only posts my custom jar. The logs output the following: {code:java} INFO [2022-03-04 15:51:17,742] ({FIFOScheduler-interpreter_1815846009-Worker-1} SparkScala212Interpreter.scala[open]:68) - UserJars: file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.10.1.jar:file:/opt/path/to/my/jar, LONG_LIST_OF_JARS_FROM_MAVEN. ... Added JAR file:///path/to/my/custom/jar at spark://x.x.x.:xxx/jars/my-custom.jar with timestamp xxx {code} So it seems like the interpreter is aware of all of my jars, but only adds the ones from the spark.jars property, whereas I would expect all of the jars to be added. If I omit the spark.jars option, I get an entry ADDED JAR file:///... for each jar of the spark.jars.packages entry. In a previous Zeppelin version (0.8.1), I was able to configure all of this via the SPARK_SUBMIT_OPTIONS environment variable like {code:java} SPARK_SUBMIT_OPTIONS=" ... --jars /abs/path/to/custom --packages cassandraconn,etc.. --repositories additional-repo{code} Is this a bug or am I converting these options in a wrong way? Thank you! -- This message was sent by Atlassian Jira (v8.20.1#820001)