Hi all, I'm using Spark provided LogisticRegression to fit a dataset. Each row of the data has 1.7 million columns, but it is sparse with only hundreds of 1s. The Spark Ui reported high GC time when the model is being trained. And my spark application got stuck without any response. I have allocated 100 executors and 8g for each executor.
Is there any thing i should do to make the training process go successfully?