subject:"Re\: Categorical Features for K\-Means Clustering"

Re: Categorical Features for K-Means Clustering

2014-09-16 Thread Aris

Yeah - another vote here to do what's called One-Hot encoding, just convert the single categorical feature into N columns, where N is the number of distinct values of that feature, with a single one and all the other features/columns set to zero. On Tue, Sep 16, 2014 at 2:16 PM, Sean Owen wrote:

Re: Categorical Features for K-Means Clustering

2014-09-16 Thread Sean Owen

I think it's on the table but not yet merged? https://issues.apache.org/jira/browse/SPARK-1216 On Tue, Sep 16, 2014 at 10:04 PM, st553 wrote: > Does MLlib provide utility functions to do this kind of encoding? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.

Re: Categorical Features for K-Means Clustering

2014-09-16 Thread st553

Does MLlib provide utility functions to do this kind of encoding? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Categorical Features for K-Means Clustering

2014-07-11 Thread Wen Phan

I see. So, basically, kind of like dummy variables like with regressions. Thanks, Sean. On Jul 11, 2014, at 10:11 AM, Sean Owen wrote: > Since you can't define your own distance function, you will need to > convert these to numeric dimensions. 1-of-n encoding can work OK, > depending on your

Re: Categorical Features for K-Means Clustering

2014-07-11 Thread Sean Owen

Since you can't define your own distance function, you will need to convert these to numeric dimensions. 1-of-n encoding can work OK, depending on your use case. So a dimension that takes on 3 categorical values, becomes 3 dimensions, of which all are 0 except one that has value 1. On Fri, Jul 11,

Re: Categorical Features for K-Means Clustering

Re: Categorical Features for K-Means Clustering

Re: Categorical Features for K-Means Clustering

Re: Categorical Features for K-Means Clustering

Re: Categorical Features for K-Means Clustering

5 matches

Site Navigation

Mail list logo

Footer information