You need to increase the parallelism/repartition the data to a higher number to get ride of those.
Thanks Best Regards On Tue, Mar 3, 2015 at 2:26 PM, lisendong <lisend...@163.com> wrote: > why does the gc time so long? > > i 'm using als in mllib, while the garbage collection time is too long > (about 1/3 of total time) > > I have tried some measures in the "tunning spark guide", and try to set the > new generation memory, but it still does not work... > > > > > > Tasks > > Task Index Task ID Status Locality Level Executor Launch > Time Duration GC > Time Result Ser Time Shuffle Read Write Time Shuffle Write > Errors > 1 2801 SUCCESS PROCESS_LOCAL h1.zw 2015/03/03 16:35:15 > 8.6 min 3.3 min > 1238.3 MB 57 ms 69.2 MB > 0 2800 SUCCESS PROCESS_LOCAL h11.zw 2015/03/03 16:35:15 > 6.0 min 1.1 min > 1261.0 MB 55 ms 68.6 MB > 2 2802 SUCCESS PROCESS_LOCAL h9.zw 2015/03/03 16:35:15 > 5.0 min 1.5 min > 834.4 MB 60 ms 69.6 MB > 4 2804 SUCCESS PROCESS_LOCAL h4.zw 2015/03/03 16:35:15 > 4.4 min 59 s 689.8 > MB 62 ms 71.4 MB > 3 2803 SUCCESS PROCESS_LOCAL h8.zw 2015/03/03 16:35:15 > 4.2 min 1.6 min > 803.6 MB 66 ms 71.5 MB > 7 2807 SUCCESS PROCESS_LOCAL h6.zw 2015/03/03 16:35:15 > 4.3 min 1.4 min > 733.1 MB 9 s 66.5 MB > 6 2806 SUCCESS PROCESS_LOCAL h10.zw 2015/03/03 16:35:15 > 6.4 min 3.1 min > 950.5 MB 68 ms 69.3 MB > 5 2805 SUCCESS PROCESS_LOCAL h3.zw 2015/03/03 16:35:15 > 8.0 min 2.7 min > 1132.0 MB 64 ms 70.3 MB > 8 2808 SUCCESS PROCESS_LOCAL h12.zw 2015/03/03 16:35:15 > 4.5 min 2.2 min > 1304.2 MB 60 ms 69.4 MB > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/gc-time-too-long-when-using-mllib-als-tp21891.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >