When using spark ml's LogisticRegression, RandomForest, CrossValidator etc. do we need to give any consideration while coding in making it scale with more CPUs or does it scale automatically?
I am reading some data from S3, using a pipeline to train a model. I am running the job on a spark cluster with 36 cores and 60GB RAM and I cannot see much usage. It is running but I was expecting spark to use all RAM available and make it faster. So that's why I was thinking whether we need to take something particular in consideration or wrong expectations?