Spark LogisticRegression got stuck on dataset with millions of columns

Qian He Mon, 22 Apr 2019 17:04:07 -0700

Hi all,

I'm using Spark provided LogisticRegression to fit a dataset. Each row of
the data has 1.7 million columns, but it is sparse with only hundreds of
1s. The Spark Ui reported high GC time when the model is being trained. And
my spark application got stuck without any response. I have allocated 100
executors and 8g for each executor.


Is there any thing i should do to make the training process go successfully?

Spark LogisticRegression got stuck on dataset with millions of columns

Reply via email to