; > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> >
spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubsc
Does MLlib provide utility functions to do this kind of encoding?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
I see. So, basically, kind of like dummy variables like with regressions.
Thanks, Sean.
On Jul 11, 2014, at 10:11 AM, Sean Owen wrote:
> Since you can't define your own distance function, you will need to
> convert these to numeric dimensions. 1-of-n encoding can work OK,
> depending on your
Since you can't define your own distance function, you will need to
convert these to numeric dimensions. 1-of-n encoding can work OK,
depending on your use case. So a dimension that takes on 3 categorical
values, becomes 3 dimensions, of which all are 0 except one that has
value 1.
On Fri, Jul 11,
Hi Folks,
Does any one have experience or recommendations on incorporating categorical
features (attributes) into k-means clustering in Spark? In other words, I want
to cluster on a set of attributes that include categorical variables.
I know I could probably implement some custom code to pars