[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-236:
--------------------------------
Attachment: MAHOUT-236.patch
Here's a patch that adds a CDbw reference point MR job that iterates over the
clustered points passed to it. I had to change the clustered point output
formats to [clusterId :: VectorWritable] and that required other changes -
mostly to unit tests. Patch includes three unit tests (Canopy, KMeans + partial
Dirichlet) .
It's a work in progress since I need to make some more changes to get the fuzzy
kmeans tests to pass and the Dirichlet process doesnt actually cluster points.
Run TestCDbwEvaluator to see some output from the reference point engine. Still
need to compute the final CDbw.
> Cluster Evaluation Tools
> ------------------------
>
> Key: MAHOUT-236
> URL: https://issues.apache.org/jira/browse/MAHOUT-236
> Project: Mahout
> Issue Type: New Feature
> Components: Clustering
> Reporter: Grant Ingersoll
> Attachments: MAHOUT-236.patch
>
>
> Per
> http://www.lucidimagination.com/search/document/10b562f10288993c/validating_clustering_output#9d3f6a55f4a91cb6,
> it would be great to have some utilities to help evaluate the effectiveness
> of clustering.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.