Lukas Heppe created ZEPPELIN-5666:
-------------------------------------

             Summary: Spark Additional jars: Does spark.jars override 
spark.jars.packages?
                 Key: ZEPPELIN-5666
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5666
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.10.1
            Reporter: Lukas Heppe


Hey,

I got the following setup:

Spark 3.1.2 Standalone Cluster (1 Master, 2 Worker)

Zeppelin 0.10.1

SparkInterpreterSetting:

 
{code:java}
SPARK_HOME     /opt/spark (points to spark-3.1.2) 
spark.master   spark://my-spark-master-host:7077

spark.submit.deployMode  client
spark.jars.packages 
com.datastax.spark:spark-cassandra-connector_2.12:3.1.0,eu.europa.ec.joinup.sd-dss:dss-xades:5.9
 

spark.jars.repositories  
https://ec.europa.eu/cefdigital/artifact/content/repositories/esignaturedss/{code}
 

 

If I prepare a cell like 
{code:java}
%spark
sc.version {code}
 

I get the correct output (3.1.2). Also cells which compute PI from the spark 
examples seem to work.

Also, if I use the spark cassandra connector, I get the first five rows like 
expected.

 
{code:java}
%spark
import com.datastax.spark.connector._

val rdd = sc.cassandraTable("mykeyspace", "mytable")
println(rdd.take(5).toList) {code}
However, if I try to add a local jar via the spark.jars property as following
{code:java}
spark.jars file:///absolute/path/to/my/custom/jar
{code}
the jars provided via spark.jars.packages are not part of the SparkContext. The 
jar is located at the worker and zeppelin at the same path. If I run
{code:java}
%spark
sc.listJars().foreach(println) {code}
without spark.jars set, I get a long list like expected (stuff from datastax + 
eu repos). However, if I restart the interpreter and provide the spark.jars 
option, the cell from above only posts my custom jar. The logs output the 
following:
{code:java}
INFO [2022-03-04 15:51:17,742] ({FIFOScheduler-interpreter_1815846009-Worker-1} 
SparkScala212Interpreter.scala[open]:68) - UserJars: 
file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.10.1.jar:file:/opt/path/to/my/jar,
 LONG_LIST_OF_JARS_FROM_MAVEN.

...

Added JAR file:///path/to/my/custom/jar at 
spark://x.x.x.:xxx/jars/my-custom.jar with timestamp xxx {code}
So it seems like the interpreter is aware of all of my jars, but only adds the 
ones from the spark.jars property, whereas I would expect all of the jars to be 
added. If I omit the spark.jars option, I get an entry ADDED JAR file:///... 
for each jar of the spark.jars.packages entry. 

In a previous Zeppelin version (0.8.1), I was able to configure all of this via 
the SPARK_SUBMIT_OPTIONS environment variable like 
{code:java}
SPARK_SUBMIT_OPTIONS=" ... --jars /abs/path/to/custom --packages 
cassandraconn,etc.. --repositories additional-repo{code}
Is this a bug or am I converting these options in a wrong way?

Thank you!

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to