Hi Spark people, Sorry to bug everyone again about this, but do people have any thoughts on whether sub-contexts would be a good way to solve this problem? I'm thinking of something like
class SparkContext { // ... stuff ... def inSubContext[T](fn: SparkContext => T): T } this way, I could do something like val sc = /* get myself a spark context somehow */; val rdd = sc.textFile("/stuff.txt") sc.inSubContext { sc1 => sc1.addJar("extras-v1.jar") print(sc1.filter(/* fn that depends on jar */).count) } sc.inSubContext { sc2 => sc2.addJar("extras-v2.jar") print(sc2.filter(/* fn that depends on jar */).count) } ... even if classes in extras-v1.jar and extras-v2.jar have name collisions. Punya From: Punya Biswal <pbis...@palantir.com> Reply-To: <user@spark.apache.org> Date: Sunday, March 16, 2014 at 11:09 AM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Separating classloader management from SparkContexts Hi all, I'm trying to use Spark to support users who are interactively refining the code that processes their data. As a concrete example, I might create an RDD[String] and then write several versions of a function to map over the RDD until I'm satisfied with the transformation. Right now, once I do addJar() to add one version of the jar to the SparkContext, there's no way to add a new version of the jar unless I rename the classes and functions involved, or lose my current work by re-creating the SparkContext. Is there a better way to do this? One idea that comes to mind is that we could add APIs to create "sub-contexts" from within a SparkContext. Jars added to a sub-context would get added to a child classloader on the executor, so that different sub-contexts could use classes with the same name while still being able to access on-heap objects for RDDs. If this makes sense conceptually, I'd like to work on a PR to add such functionality to Spark. Punya
smime.p7s
Description: S/MIME cryptographic signature