When using spark ml's LogisticRegression, RandomForest, CrossValidator etc.
do we need to give any consideration while coding in making it scale with
more CPUs or does it scale automatically?

I am reading some data from S3, using a pipeline to train a model. I am
running the job on a spark cluster with 36 cores and 60GB RAM and I cannot
see much usage. It is running but I was expecting spark to use all RAM
available and make it faster. So that's why I was thinking whether we need
to take something particular in consideration or wrong expectations?

Reply via email to