It's really simple: https://gist.github.com/ezhulenev/7777886517723ca4a353

The same strange heap behavior we've seen even for single model, it takes
~20 gigs heap on a driver to build single model with less than 1 million
rows in input data frame.

On Wed, Sep 23, 2015 at 6:31 PM, DB Tsai <dbt...@dbtsai.com> wrote:

> Could you paste some of your code for diagnosis?
>
>
> Sincerely,
>
> DB Tsai
> ----------------------------------------------------------
> Blog: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
> <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D>
>
> On Wed, Sep 23, 2015 at 3:19 PM, Eugene Zhulenev <
> eugene.zhule...@gmail.com> wrote:
>
>> We are running Apache Spark 1.5.0 (latest code from 1.5 branch)
>>
>> We are running 2-3 LogisticRegression models in parallel (we'd love to
>> run 10-20 actually), they are not really big at all, maybe 1-2 million rows
>> in each model.
>>
>> Cluster itself, and all executors look good. Enough free memory and no
>> exceptions or errors.
>>
>> However I see very strange behavior inside Spark driver. Allocated heap
>> constantly growing. It grows up to 30 gigs in 1.5 hours and then everything
>> becomes super sloooooow.
>>
>> We don't do any collect, and I really don't understand who is consuming
>> all this memory. Looks like it's something inside LogisticRegression
>> itself, however I only see treeAggregate which should not require so much
>> memory to run.
>>
>> Any ideas?
>>
>> Plus I don't see any GC pause, looks like memory is still used by someone
>> inside driver.
>>
>> [image: Inline image 2]
>> [image: Inline image 1]
>>
>
>

Reply via email to