Before hardware optimization there is always software optimization.
Are you using dataset / dataframe? Are you using the  right data types ( eg int 
where int is appropriate , try to avoid string and char etc)
Do you extract only the stuff needed? What are the algorithm parameters?

> On 07 Jun 2016, at 13:09, Franc Carter <[email protected]> wrote:
> 
> 
> Hi,
> 
> I am training a RandomForest Regression Model on Spark-1.6.1 (EMR) and am 
> interested in how it might be best to scale it - e.g more cpus per instances, 
> more memory per instance, more instances etc.
> 
> I'm currently using 32 m3.xlarge instances for for a training set with 2.5 
> million rows, 1300 columns and a total size of 31GB (parquet)
> 
> thanks
> 
> -- 
> Franc

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to