Before OneHotEncoder or LabelIndexer is merged, you can define an UDF
to do the mapping.
val labelToIndex = udf { ... }
featureDF.withColumn("f3_dummy", labelToIndex(col("f3")))
See instructions here
http://spark.apache.org/docs/latest/sql-programming-guide.html#udf-registration-moved-to-sqlconte
Hi folks, currently have a DF that has a factor variable -- say gender.
I am hoping to use the RandomForest algorithm on this data an it appears
that this needs to be converted to RDD[LabeledPoint] first -- i.e. all
features need to be double-encoded.
I see https://issues.apache.org/jira/browse/S