Separating classloader management from SparkContexts

Punya Biswal Sun, 16 Mar 2014 08:10:18 -0700

Hi all,

I'm trying to use Spark to support users who are interactively refining the
code that processes their data. As a concrete example, I might create an
RDD[String] and then write several versions of a function to map over the
RDD until I'm satisfied with the transformation. Right now, once I do
addJar() to add one version of the jar to the SparkContext, there's no way
to add a new version of the jar unless I rename the classes and functions
involved, or lose my current work by re-creating the SparkContext. Is there
a better way to do this?


One idea that comes to mind is that we could add APIs to create
"sub-contexts" from within a SparkContext. Jars added to a sub-context would
get added to a child classloader on the executor, so that different
sub-contexts could use classes with the same name while still being able to
access on-heap objects for RDDs. If this makes sense conceptually, I'd like
to work on a PR to add such functionality to Spark.

Punya

smime.p7s
Description: S/MIME cryptographic signature

Separating classloader management from SparkContexts

Reply via email to