Sometimes for this case, I will just standardize without centerization. I
still get good result.

Sent from my Google Nexus 5
On May 28, 2014 7:03 PM, "Xiangrui Meng" <men...@gmail.com> wrote:

> RowMatrix has a method to compute column summary statistics. There is
> a trade-off here because centering may densify the data. A utility
> function that centers data would be useful for dense datasets.
> -Xiangrui
>
> On Wed, May 28, 2014 at 5:03 AM, dataginjaninja
> <rickett.stepha...@gmail.com> wrote:
> > I searched on this, but didn't find anything general so I apologize if
> this
> > has been addressed.
> >
> > Many algorithms (SGD, SVM...) either will not converge or will run
> forever
> > if the data is not scaled. Sci-kit has  preprocessing
> > <
> http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html
> >
> > that will subtract the mean and divide by standard dev. Of course there
> are
> > a few options with it as well.
> >
> > Is there something in the works for this?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Standard-preprocessing-scaling-tp6826.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Reply via email to