Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Weichen Xu
Could you provide your code, and running cluster info ? On Tue, Apr 23, 2019 at 4:10 PM Qian He wrote: > The dataset was using a sparse representation before feeding into > LogisticRegression. > > On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu > wrote: > >> Hi Qian, >> >> Do your dataset use sparse

Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Qian He
The dataset was using a sparse representation before feeding into LogisticRegression. On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu wrote: > Hi Qian, > > Do your dataset use sparse vector format ? > > > > On Mon, Apr 22, 2019 at 5:03 PM Qian He wrote: > >> Hi all, >> >> I'm using Spark provided Lo

Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Weichen Xu
Hi Qian, Do your dataset use sparse vector format ? On Mon, Apr 22, 2019 at 5:03 PM Qian He wrote: > Hi all, > > I'm using Spark provided LogisticRegression to fit a dataset. Each row of > the data has 1.7 million columns, but it is sparse with only hundreds of > 1s. The Spark Ui reported hig