Re: Standard preprocessing/scaling

Xiangrui Meng Wed, 28 May 2014 19:04:23 -0700

RowMatrix has a method to compute column summary statistics. There is
a trade-off here because centering may densify the data. A utility
function that centers data would be useful for dense datasets.
-Xiangrui


On Wed, May 28, 2014 at 5:03 AM, dataginjaninja
<[email protected]> wrote:
> I searched on this, but didn't find anything general so I apologize if this
> has been addressed.
>
> Many algorithms (SGD, SVM...) either will not converge or will run forever
> if the data is not scaled. Sci-kit has  preprocessing
> <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html>
> that will subtract the mean and divide by standard dev. Of course there are
> a few options with it as well.
>
> Is there something in the works for this?
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Standard-preprocessing-scaling-tp6826.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Standard preprocessing/scaling

Reply via email to