[DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

nupurshukla Wed, 22 Jul 2020 15:44:34 -0700

Hello, 

I am prototyping a change in the behavior of spark.jars conf for my
use-case.  spark.jars conf is used to specify a list of jars to include on
the driver and executor classpaths.


*Current behavior:*  spark.jars conf value is not read until after the JVM
has already started and the system classloader has already loaded, and hence
the jars added using this conf get “appended” to the spark classpath. This
means that spark looks for the jar in its default classpath first and then
looks at the path specified in spark.jars conf. 

*Proposed prototype:* I am proposing a new behavior where we can have
spark.jars take precedence over spark default classpath in terms of how jars
are discovered. This can be achieved by using
spark.{driver,executor}.extraClassPath conf. This conf modifies the actual
launch command of the driver (or executors), and hence this path is
"prepended" to the classpath and thus takes precedence over the default
classpath. Can the behavior of conf spark.jars be modified by adding the
conf value of spark.jars to conf value of 
spark.{driver,executor}.extraClassPath during argument parsing in 
SparkSubmitArguments.scala
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
 
, so that we can achieve precedence order of jars specified in spark.jars > 
spark.{driver,executor}.extraClassPath > spark default classpath (left to
right precedence order)

*Pseudo sample code:*
In  loadEnvironmentArguments()
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
 
:

/if (jars != null) {
      if (driverExtraClassPath != null) {
        driverExtraClassPath = driverExtraClassPath + "," + jars
      }
      else {
        driverExtraClassPath = jars
      }
    }/


*As an example*, consider jars :
sample-jar-1.0.0.jar present in spark’s default classpath
sample-jar-2.0.0.jar present on all nodes of the cluster at path 
/<somepath>/
new-jar-1.0.0.jar present on all nodes of the cluster at path /<somepath>/
(and not in spark default classpath)

And two scenarios 2 spark jobs are submitted with the following – jars conf
values

<http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3705/Capture.png>
 


What are your thoughts on this? Could this have any undesired side-effects?
Or has this already been explored and there are some known issues with this
approach?

Thanks,
Nupur



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

Reply via email to