RowMatrix has a method to compute column summary statistics. There is a trade-off here because centering may densify the data. A utility function that centers data would be useful for dense datasets. -Xiangrui
On Wed, May 28, 2014 at 5:03 AM, dataginjaninja <rickett.stepha...@gmail.com> wrote: > I searched on this, but didn't find anything general so I apologize if this > has been addressed. > > Many algorithms (SGD, SVM...) either will not converge or will run forever > if the data is not scaled. Sci-kit has preprocessing > <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html> > that will subtract the mean and divide by standard dev. Of course there are > a few options with it as well. > > Is there something in the works for this? > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Standard-preprocessing-scaling-tp6826.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com.