You should move to 0.8 and explore ball k-means.
On Tue, Dec 3, 2013 at 8:44 PM, Prabhakar Srinivasan < [email protected]> wrote: > Hello > I am using Mahout 0.7 currently and this question is pertaining to that > version. I am using Canopy clustering (CanopyDriver class) first to > determine the optimal number of clusters that best fits the dataset and > passing that information as parameter to Kmeans clustering (kmeansDriver > class). > > Regards > Prabhakar > > > On Tue, Dec 3, 2013 at 6:00 PM, Ted Dunning <[email protected]> wrote: > > > Can you be more specific about which code you are asking about? > > > > The ball k-means implementation provides a capability somewhat like this, > > but perhaps in a more clearly defined way. > > > > > > On Tue, Dec 3, 2013 at 9:34 AM, Prabhakar Srinivasan < > > [email protected]> wrote: > > > > > Hello! > > > Can someone point me to some explanatory documentation for Outlier > > > Detection & Removal in Clustering in Mahout. I am unable to understand > > the > > > internal mechanism of outlier detection just by reading the Javadoc: > > > clusterClassificationThreshold Is a clustering strictness / outlier > > removal > > > parameter. Its value should be between 0 and 1. Vectors having pdf > below > > > this value will not be clustered. > > > > > > What does the pdf represent? > > > > > > Thanks > > > Prabhakar > > > > > >
