Hi, It's supposed to work like this - share SparkContext to share datasets between threads.
Ad 1. No Ad 2. Yes See CrossValidation and similar validations in spark.ml. Jacek On 9 Jun 2016 7:29 p.m., "Brandon White" <bwwintheho...@gmail.com> wrote: > For example, say I want to train two Linear Regressions and two GBD Tree > Regressions. > > Using different threads, Spark allows you to submit jobs at the same time > (see: http://spark.apache.org/docs/latest/job-scheduling.html). If I > schedule two or more training jobs and they are running at the same time: > > 1) Is there any risk that static worker variables or worker state could > become corrupted leading to incorrect calculations? > 2) Is Spark ML designed for running two or more training jobs at the same > time? Is this something the architects consider during implementation? > > Thanks, > > Brandon > > > >