Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Bryan Cutler
much. It is great help, I will try spark-sklearn. >> >> Prem >> >> >> >> >> >> >> >> >> >> *From: *Yanbo Liang >> *Date: *Tuesday, September 5, 2017 at 10:40 AM >> *To: *Patrick McCarthy >> *Cc: *"Timsina,

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Yanbo Liang
> > > > > > > > *From: *Yanbo Liang > *Date: *Tuesday, September 5, 2017 at 10:40 AM > *To: *Patrick McCarthy > *Cc: *"Timsina, Prem" , "user@spark.apache.org" < > user@spark.apache.org> > *Subject: *Re: Apache Spark: Parallelizatio

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Timsina, Prem
ay, September 5, 2017 at 10:40 AM To: Patrick McCarthy Cc: "Timsina, Prem" , "user@spark.apache.org" Subject: Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm Hi Prem, How large is your dataset? Can it be fitted in a single node? If no, Spark ML

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Yanbo Liang
Hi Prem, How large is your dataset? Can it be fitted in a single node? If no, Spark MLlib provide CrossValidation which can run multiple machine learning algorithms parallel on distributed dataset and do parameter search. FYI: https://spark.apache.org/docs/latest/ml-tuning.html#cross-validation If

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Patrick McCarthy
You might benefit from watching this JIRA issue - https://issues.apache.org/jira/browse/SPARK-19071 On Sun, Sep 3, 2017 at 5:50 PM, Timsina, Prem wrote: > Is there a way to parallelize multiple ML algorithms in Spark. My use case > is something like this: > > A) Run multiple machine learning alg