Re: Spark LogisticRegression got stuck on dataset with millions of columns

Weichen Xu Tue, 23 Apr 2019 16:35:47 -0700

Could you provide your code, and running cluster info ?

On Tue, Apr 23, 2019 at 4:10 PM Qian He <hq.ja...@gmail.com> wrote:


> The dataset was using a sparse representation before feeding into
> LogisticRegression.
>
> On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu <weichen...@databricks.com>
> wrote:
>
>> Hi Qian,
>>
>> Do your dataset use sparse vector format ?
>>
>>
>>
>> On Mon, Apr 22, 2019 at 5:03 PM Qian He <hq.ja...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm using Spark provided LogisticRegression to fit a dataset. Each row
>>> of the data has 1.7 million columns, but it is sparse with only hundreds of
>>> 1s. The Spark Ui reported high GC time when the model is being trained. And
>>> my spark application got stuck without any response. I have allocated 100
>>> executors and 8g for each executor.
>>>
>>> Is there any thing i should do to make the training process go
>>> successfully?
>>>
>>

Re: Spark LogisticRegression got stuck on dataset with millions of columns

Reply via email to