Hi Spark people,

Sorry to bug everyone again about this, but do people have any thoughts on
whether sub-contexts would be a good way to solve this problem? I'm thinking
of something like

class SparkContext {
  // ... stuff ...
  def inSubContext[T](fn: SparkContext => T): T
}

this way, I could do something like

val sc = /* get myself a spark context somehow */;
val rdd = sc.textFile("/stuff.txt")
sc.inSubContext { sc1 =>
  sc1.addJar("extras-v1.jar")
  print(sc1.filter(/* fn that depends on jar */).count)
}
sc.inSubContext { sc2 =>
  sc2.addJar("extras-v2.jar")
  print(sc2.filter(/* fn that depends on jar */).count)
}

... even if classes in extras-v1.jar and extras-v2.jar have name collisions.

Punya

From:  Punya Biswal <pbis...@palantir.com>
Reply-To:  <user@spark.apache.org>
Date:  Sunday, March 16, 2014 at 11:09 AM
To:  "user@spark.apache.org" <user@spark.apache.org>
Subject:  Separating classloader management from SparkContexts

Hi all,

I'm trying to use Spark to support users who are interactively refining the
code that processes their data. As a concrete example, I might create an
RDD[String] and then write several versions of a function to map over the
RDD until I'm satisfied with the transformation. Right now, once I do
addJar() to add one version of the jar to the SparkContext, there's no way
to add a new version of the jar unless I rename the classes and functions
involved, or lose my current work by re-creating the SparkContext. Is there
a better way to do this?

One idea that comes to mind is that we could add APIs to create
"sub-contexts" from within a SparkContext. Jars added to a sub-context would
get added to a child classloader on the executor, so that different
sub-contexts could use classes with the same name while still being able to
access on-heap objects for RDDs. If this makes sense conceptually, I'd like
to work on a PR to add such functionality to Spark.

Punya



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to