no you don't need one hot. but since the feature column is a vector and vector only accepts numbers, if your feature is string then a StringIndexer is needed.
http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier here's an example. On Thu, Mar 23, 2017 at 10:34 PM, Aseem Bansal <asmbans...@gmail.com> wrote: > I was reading http://datascience.stackexchange.com/questions/ > 5226/strings-as-features-in-decision-tree-random-forest and found that > needs to be done in sklearn. Is that required in spark? >