Hi, Myles, We do not install scikit-learn or spark-sklearn on EMR clusters by default, but you may install them yourself by just doing "sudo pip install scikit-learn spark-sklearn" (either by ssh'ing to the master instance and running this manually, or by running it as an EMR Step).
~ Jonathan On Tue, Mar 1, 2016 at 3:20 PM Gartland, Myles <myles.gartl...@rockhurst.edu> wrote: > New to Spark and MLlib. Coming from sickit learn. > > I am launching my Spark 1.6 instance through AWS EMR and pyspark. All the > examples using Mllib work fine. > > But I have seen a couple examples where you can combine scikit learn > packages and syntax with mllib. > > Like in this example- > https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html > > However, it does not seem that Pyspark on AWS EMR comes with scikit (or > other standard pydata packages) loaded. > > Is this something you can/should load on pyspark and how would you do it? > > Thanks for assisting. > > > Myles >