Re: MLLib: LinearRegressionWithSGD performance

2014-11-24 Thread Yanbo
From the metrics page, it reveals that only two executors work parallel for each iteration. You need to improve parallel threads numbers. Some tips maybe helpful: Increase "spark.default.parallelism"; Use repartition() or coalesce() to increase partition number. > 在 2014年11月22日,上午3:18,Sameer Ti

Re: MLLib: LinearRegressionWithSGD performance

2014-11-21 Thread Jayant Shekhar
Hi Sameer, You can also use repartition to create a higher number of tasks. -Jayant On Fri, Nov 21, 2014 at 12:02 PM, Jayant Shekhar wrote: > Hi Sameer, > > You can try increasing the number of executor-cores. > > -Jayant > > > > > > On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak wrote: > >>

Re: MLLib: LinearRegressionWithSGD performance

2014-11-21 Thread Jayant Shekhar
Hi Sameer, You can try increasing the number of executor-cores. -Jayant On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak wrote: > Hi All, > I have been using MLLib's linear regression and I have some question > regarding the performance. We have a cluster of 10 nodes -- each node has > 24 co