Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/119#discussion_r10507644
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -130,6 +130,16 @@ class SparkContext(
     
       val isLocal = (master == "local" || master.startsWith("local["))
     
    +  // Create a classLoader for use by the driver so that jars added via 
addJar are available to the
    +  // driver.  Do this before all other initialization so that any thread 
pools created for this
    +  // SparkContext uses the class loader.
    +  // Note that this is config-enabled as classloaders can introduce subtle 
side effects
    +  private[spark] val classLoader = if 
(conf.getBoolean("spark.driver.loadAddedJars", false)) {
    +    val loader = new SparkURLClassLoader(Array.empty[URL], 
this.getClass.getClassLoader)
    +    Thread.currentThread.setContextClassLoader(loader)
    --- End diff --
    
    Ah, ok, I understand now.  In that case, to make things simpler, would it 
possibly make sense to not load the jars to the current thread and only load 
them for the SparkContext/executors?  Classloader stuff can be confusing to 
deal with and keeping it as isolated as possible could make things easier for 
users.  This would also line up a little more with how the MR distributed cache 
works - jars that get added to it don't become accessible for to driver code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to