Hi all, I'm trying to use Spark to support users who are interactively refining the code that processes their data. As a concrete example, I might create an RDD[String] and then write several versions of a function to map over the RDD until I'm satisfied with the transformation. Right now, once I do addJar() to add one version of the jar to the SparkContext, there's no way to add a new version of the jar unless I rename the classes and functions involved, or lose my current work by re-creating the SparkContext. Is there a better way to do this?
One idea that comes to mind is that we could add APIs to create "sub-contexts" from within a SparkContext. Jars added to a sub-context would get added to a child classloader on the executor, so that different sub-contexts could use classes with the same name while still being able to access on-heap objects for RDDs. If this makes sense conceptually, I'd like to work on a PR to add such functionality to Spark. Punya
smime.p7s
Description: S/MIME cryptographic signature