Re: Does spark's random forest need categorical features to be one hot encoded?

Ryan Thu, 23 Mar 2017 20:45:48 -0700

no you don't need one hot. but since the feature column is a vector and
vector only accepts numbers, if your feature is string then a StringIndexer
is needed.

http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier
here's an example.

On Thu, Mar 23, 2017 at 10:34 PM, Aseem Bansal <asmbans...@gmail.com> wrote:

> I was reading http://datascience.stackexchange.com/questions/
> 5226/strings-as-features-in-decision-tree-random-forest and found that
> needs to be done in sklearn. Is that required in spark?
>

Re: Does spark's random forest need categorical features to be one hot encoded?

Reply via email to