You running locally? Found exactly same issue. 2 solutions: _ reduce datA size. _ run on EMR Hth
On 10 Jan 2017 10:07 am, "Julio Antonio Soto" <ju...@esbet.es> wrote: > Hi, > > I am running into OOM problems while training a Spark ML > RandomForestClassifier (maxDepth of 30, 32 maxBins, 100 trees). > > My dataset is arguably pretty big given the executor count and size > (8x5G), with approximately 20M rows and 130 features. > > The "fun fact" is that a single DecisionTreeClassifier with the same specs > (same maxDepth and maxBins) is able to train without problems in a couple > of minutes. > > AFAIK the current random forest implementation grows each tree > sequentially, which means that DecisionTreeClassifiers are fit one by one, > and therefore the training process should be similar in terms of memory > consumption. Am I missing something here? > > Thanks > Julio >