Hi,
    I  run the kmeans(MLlib) in a cluster with 12 workers.  Every work own
a 128G RAM, 24Core. I run 48 task in one machine. the total data is just
40GB.

   When the dimension of the data set is about 10^7, for every task the
duration is about 30s, but the cost for GC is about 20s.

   When I reduce the dimension to 10^4, then the gc is small.

    So why gc is so high when the dimension is larger? or this is the
reason caused by MLlib?

Reply via email to