You running locally? Found exactly same issue.
2 solutions:
_ reduce datA size.
_ run on EMR
Hth

On 10 Jan 2017 10:07 am, "Julio Antonio Soto" <ju...@esbet.es> wrote:

> Hi,
>
> I am running into OOM problems while training a Spark ML
> RandomForestClassifier (maxDepth of 30, 32 maxBins, 100 trees).
>
> My dataset is arguably pretty big given the executor count and size
> (8x5G), with approximately 20M rows and 130 features.
>
> The "fun fact" is that a single DecisionTreeClassifier with the same specs
> (same maxDepth and maxBins) is able to train without problems in a couple
> of minutes.
>
> AFAIK the current random forest implementation grows each tree
> sequentially, which means that DecisionTreeClassifiers are fit one by one,
> and therefore the training process should be similar in terms of memory
> consumption. Am I missing something here?
>
> Thanks
> Julio
>

Reply via email to