Yeah - another vote here to do what's called One-Hot encoding, just convert the single categorical feature into N columns, where N is the number of distinct values of that feature, with a single one and all the other features/columns set to zero.
On Tue, Sep 16, 2014 at 2:16 PM, Sean Owen <so...@cloudera.com> wrote: > I think it's on the table but not yet merged? > https://issues.apache.org/jira/browse/SPARK-1216 > > On Tue, Sep 16, 2014 at 10:04 PM, st553 <sthompson...@gmail.com> wrote: > > Does MLlib provide utility functions to do this kind of encoding? > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >