Re: Standard preprocessing/scaling

2014-05-29 Thread dataginjaninja
I do see the issue for centering sparse data. Actually, the centering is less important than the scaling by the standard deviation. Not having unit variance causes the convergence issues and long runtimes. RowMatrix will compute variance of a column? -- View this message in context: http://ap

Re: Standard preprocessing/scaling

2014-05-28 Thread DB Tsai
Sometimes for this case, I will just standardize without centerization. I still get good result. Sent from my Google Nexus 5 On May 28, 2014 7:03 PM, "Xiangrui Meng" wrote: > RowMatrix has a method to compute column summary statistics. There is > a trade-off here because centering may densify th

Re: Standard preprocessing/scaling

2014-05-28 Thread Xiangrui Meng
RowMatrix has a method to compute column summary statistics. There is a trade-off here because centering may densify the data. A utility function that centers data would be useful for dense datasets. -Xiangrui On Wed, May 28, 2014 at 5:03 AM, dataginjaninja wrote: > I searched on this, but didn't