Re: How to run kmeans after pca?

Burak Yavuz Wed, 17 Sep 2014 12:34:07 -0700

To properly perform PCA, you must left multiply the resulting DenseMatrix with 
the original RowMatrix. The result will also be a RowMatrix,
therefore you can easily access the values by .values, and train KMeans on that.


Don't forget to Broadcast the DenseMatrix returned from 
RowMatrix.computePrincipalComponents(), otherwise you'll get an OOME.

Here's how to do it in Scala: (didn't run the code, but should be something 
like this)

val data: RowMatrix = ...
val bcPrincipalComponents: DenseMatrix = 
data.context.broadcast(data.computePrincipalComponents())

val newData: RowMatrix = data.multiply(bcPrincipalComponents.value)

KMeans.run(newData.values)

Best,
Burak



----- Original Message -----
From: "st553" <sthompson...@gmail.com>
To: u...@spark.incubator.apache.org
Sent: Wednesday, September 17, 2014 12:21:38 PM
Subject: How to run kmeans after pca?

I would like to reduce the dimensionality of my data before running kmeans.
The problem I'm having is that both RowMatrix.computePrincipalComponents()
and RowMatrix.computeSVD() return a DenseMatrix whereas KMeans.train()
requires an RDD[Vector]. Does MLlib provide a way to do this conversion?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-run-kmeans-after-pca-tp14473.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to run kmeans after pca?

Reply via email to