Hi, Myles,

We do not install scikit-learn or spark-sklearn on EMR clusters by default,
but you may install them yourself by just doing "sudo pip install
scikit-learn spark-sklearn" (either by ssh'ing to the master instance and
running this manually, or by running it as an EMR Step).

~ Jonathan

On Tue, Mar 1, 2016 at 3:20 PM Gartland, Myles <myles.gartl...@rockhurst.edu>
wrote:

> New to Spark and MLlib. Coming from sickit learn.
>
> I am launching my Spark 1.6 instance through AWS EMR and pyspark. All the
> examples using Mllib work fine.
>
> But I have seen a couple examples where you can combine scikit learn
> packages and syntax with mllib.
>
> Like in this example-
> https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
>
> However, it does not seem that Pyspark on AWS EMR comes with scikit (or
> other standard pydata packages) loaded.
>
> Is this something you can/should load on pyspark and how would you do it?
>
> Thanks for assisting.
>
>
> Myles
>

Reply via email to