At the moment your best bet for sharing SparkContexts across jobs will be Ooyala job server: https://github.com/ooyala/spark-jobserver
It doesn't yet support spark 1.0 though I did manage to amend it to get it to build and run on 1.0 — Sent from Mailbox On Wed, Jul 23, 2014 at 1:21 AM, Asaf Lahav <asaf.la...@gmail.com> wrote: > Hi Folks, > I have been trying to dig up some information in regards to what are the > possibilities when wanting to deploy more than one client process that > consumes Spark. > Let's say I have a Spark Cluster of 10 servers, and would like to setup 2 > additional servers which are sending requests to it through a Spark > context, referencing one specific file of 1TB of data. > Each client process, has its own SparkContext instance. > Currently, the result is that that same file is loaded into memory twice > because the Spark Context resources are not shared between processes/jvms. > I wouldn't like to have that same file loaded over and over again with > every new client being introduced. > What would be the best practice here? Am I missing something? > Thank you, > Asaf