It's really simple: https://gist.github.com/ezhulenev/7777886517723ca4a353
The same strange heap behavior we've seen even for single model, it takes ~20 gigs heap on a driver to build single model with less than 1 million rows in input data frame. On Wed, Sep 23, 2015 at 6:31 PM, DB Tsai <dbt...@dbtsai.com> wrote: > Could you paste some of your code for diagnosis? > > > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Blog: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D> > > On Wed, Sep 23, 2015 at 3:19 PM, Eugene Zhulenev < > eugene.zhule...@gmail.com> wrote: > >> We are running Apache Spark 1.5.0 (latest code from 1.5 branch) >> >> We are running 2-3 LogisticRegression models in parallel (we'd love to >> run 10-20 actually), they are not really big at all, maybe 1-2 million rows >> in each model. >> >> Cluster itself, and all executors look good. Enough free memory and no >> exceptions or errors. >> >> However I see very strange behavior inside Spark driver. Allocated heap >> constantly growing. It grows up to 30 gigs in 1.5 hours and then everything >> becomes super sloooooow. >> >> We don't do any collect, and I really don't understand who is consuming >> all this memory. Looks like it's something inside LogisticRegression >> itself, however I only see treeAggregate which should not require so much >> memory to run. >> >> Any ideas? >> >> Plus I don't see any GC pause, looks like memory is still used by someone >> inside driver. >> >> [image: Inline image 2] >> [image: Inline image 1] >> > >