Re: Saving multiple outputs in the same job

2016-03-09 Thread Jeff Zhang
Spark will skip the stage if it is computed by other jobs. That means the common parent RDD of each job only needs to be computed once. But it is still multiple sequential jobs, not concurrent jobs. On Wed, Mar 9, 2016 at 3:29 PM, Jan Štěrba wrote: > Hi Andy, > > its nice to see that we are not

Re: Saving multiple outputs in the same job

2016-03-08 Thread Jan Štěrba
Hi Andy, its nice to see that we are not the only ones with the same issues. So far we have not gone as far as you have. What we have done is that we cache whatever dataframes/rdds are shared foc computing different output. This has brought us quite the speedup, but we still see that saving some l