Spark 2.1 ml library scalability

Aseem Bansal Fri, 07 Apr 2017 04:12:33 -0700

When using spark ml's LogisticRegression, RandomForest, CrossValidator etc.
do we need to give any consideration while coding in making it scale with
more CPUs or does it scale automatically?


I am reading some data from S3, using a pipeline to train a model. I am
running the job on a spark cluster with 36 cores and 60GB RAM and I cannot
see much usage. It is running but I was expecting spark to use all RAM
available and make it faster. So that's why I was thinking whether we need
to take something particular in consideration or wrong expectations?

Spark 2.1 ml library scalability

Reply via email to