high GC in the Kmeans algorithm

lihu Wed, 11 Feb 2015 00:19:55 -0800

Hi,
    I  run the kmeans(MLlib) in a cluster with 12 workers.  Every work own
a 128G RAM, 24Core. I run 48 task in one machine. the total data is just
40GB.


   When the dimension of the data set is about 10^7, for every task the
duration is about 30s, but the cost for GC is about 20s.

   When I reduce the dimension to 10^4, then the gc is small.

    So why gc is so high when the dimension is larger? or this is the
reason caused by MLlib?

high GC in the Kmeans algorithm

Reply via email to