I do see the issue for centering sparse data. Actually, the centering is less
important than the scaling by the standard deviation. Not having unit
variance causes the convergence issues and long runtimes.
RowMatrix will compute variance of a column?
--
View this message in context:
http://ap
Sometimes for this case, I will just standardize without centerization. I
still get good result.
Sent from my Google Nexus 5
On May 28, 2014 7:03 PM, "Xiangrui Meng" wrote:
> RowMatrix has a method to compute column summary statistics. There is
> a trade-off here because centering may densify th
RowMatrix has a method to compute column summary statistics. There is
a trade-off here because centering may densify the data. A utility
function that centers data would be useful for dense datasets.
-Xiangrui
On Wed, May 28, 2014 at 5:03 AM, dataginjaninja
wrote:
> I searched on this, but didn't