Could you provide your code, and running cluster info ? On Tue, Apr 23, 2019 at 4:10 PM Qian He <hq.ja...@gmail.com> wrote:
> The dataset was using a sparse representation before feeding into > LogisticRegression. > > On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu <weichen...@databricks.com> > wrote: > >> Hi Qian, >> >> Do your dataset use sparse vector format ? >> >> >> >> On Mon, Apr 22, 2019 at 5:03 PM Qian He <hq.ja...@gmail.com> wrote: >> >>> Hi all, >>> >>> I'm using Spark provided LogisticRegression to fit a dataset. Each row >>> of the data has 1.7 million columns, but it is sparse with only hundreds of >>> 1s. The Spark Ui reported high GC time when the model is being trained. And >>> my spark application got stuck without any response. I have allocated 100 >>> executors and 8g for each executor. >>> >>> Is there any thing i should do to make the training process go >>> successfully? >>> >>