Re: MLLib: LinearRegressionWithSGD performance

Yanbo Mon, 24 Nov 2014 07:58:12 -0800

From the metrics page, it reveals that only two executors work parallel for 
each iteration.
You need to improve parallel threads numbers.
Some tips maybe helpful:
Increase "spark.default.parallelism";
Use repartition() or coalesce() to increase partition number.




> 在 2014年11月22日，上午3:18，Sameer Tilak <ssti...@live.com> 写道：
> 
> Hi All,
> I have been using MLLib's linear regression and I have some question 
> regarding the performance. We have a cluster of 10 nodes -- each node has 24 
> cores and 148GB memory. I am running my app as follows:
> 
> time spark-submit --class medslogistic.MedsLogistic --master yarn-client 
> --executor-memory 6G --num-executors 10 /pathtomyapp/myapp.jar
> 
> I am also going to play with number of executors (reduce it) may be that will 
> give us different results.  
> 
> The input is a 800MB sparse file in LibSVNM format. Total number of features 
> is 150K. It takes approximately 70 minutes for the regression to finish. The 
> job imposes very little load on CPU, memory, network, and disk. Total number 
> of tasks is 104.  Total time gets divided fairly uniformly across these tasks 
> each task. I was wondering, is it possible to reduce the execution time 
> further? 
> <Screen Shot 2014-11-21 at 11.09.20 AM.png>
> <Screen Shot 2014-11-21 at 10.59.42 AM.png>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org

Re: MLLib: LinearRegressionWithSGD performance

Reply via email to