[ 
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-236:
--------------------------------

    Attachment: MAHOUT-236.patch

Here's a patch that adds a CDbw reference point MR job that iterates over the 
clustered points passed to it. I had to change the clustered point output 
formats to [clusterId :: VectorWritable] and that required other changes - 
mostly to unit tests. Patch includes three unit tests (Canopy, KMeans + partial 
Dirichlet) .

It's a work in progress since I need to make some more changes to get the fuzzy 
kmeans tests to pass and the Dirichlet process doesnt actually cluster points.

Run TestCDbwEvaluator to see some output from the reference point engine. Still 
need to compute the final CDbw.

> Cluster Evaluation Tools
> ------------------------
>
>                 Key: MAHOUT-236
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-236
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Grant Ingersoll
>         Attachments: MAHOUT-236.patch
>
>
> Per 
> http://www.lucidimagination.com/search/document/10b562f10288993c/validating_clustering_output#9d3f6a55f4a91cb6,
>  it would be great to have some utilities to help evaluate the effectiveness 
> of clustering.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to