Maybe this is helpful
https://github.com/lensacom/sparkit-learn/blob/master/README.rst
Sent from my Verizon Wireless 4G LTE smartphone
Original message
From: Mustafa Elbehery
Date: 12/06/2015 3:59 PM (GMT-05:00)
To: user
Subject: PySpark RDD with NumpyArray
Hi All,
I would like to parallelize Python NumpyArray to apply scikit Learn
algorithm on top of Spark. When I call *sc.parallelize() *I receive rdd of
different structure.
To be more precise, I am trying to have the following,
X = [[ 0.49426097 1.45106697]
[-1.42808099 -0.83706377]
[ 0.338559