Thanks Vadim & Jörn... I will look into those.

jg

> On Jun 20, 2017, at 2:12 PM, Vadim Semenov <vadim.seme...@datadoghq.com> 
> wrote:
> 
> You can launch one permanent spark context and then execute your jobs within 
> the context. And since they'll be running in the same context, they can share 
> data easily.
> 
> These two projects provide the functionality that you need:
> https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs
>  
> <https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs>
> https://github.com/cloudera/livy#post-sessions 
> <https://github.com/cloudera/livy#post-sessions>
> 
> On Tue, Jun 20, 2017 at 1:46 PM, Jean Georges Perrin <j...@jgp.net 
> <mailto:j...@jgp.net>> wrote:
> Hey,
> 
> Here is my need: program A does something on a set of data and produces 
> results, program B does that on another set, and finally, program C combines 
> the data of A and B. Of course, the easy way is to dump all on disk after A 
> and B are done, but I wanted to avoid this.
> 
> I was thinking of creating a temp view, but I do not really like the temp 
> aspect of it ;). Any idea (they are all worth sharing)
> 
> jg
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> 
> 

Reply via email to