On Tue, Dec 3, 2013 at 9:34 AM, Prabhakar Srinivasan <
[email protected]> wrote:

> Hello!
> Can someone point me to some explanatory documentation for Outlier
> Detection & Removal in Clustering in Mahout. I am unable to understand the
> internal mechanism of outlier detection just by reading the Javadoc:
> clusterClassificationThreshold Is a clustering strictness / outlier removal
> parameter. Its value should be between 0 and 1. Vectors having pdf below
> this value will not be clustered.
>
> What does the pdf represent?
>

i don't really  in the context of Mahout implementation of this, but i'd
venture to go on a limb and say pdf value = value of probability density
function for that data point (in outlier detection one  usually estimates
distribution of the data with some multidimensional density estimation
technique involving kernel functions and then just removes highly
improbable values).


> Thanks
> Prabhakar
>

Reply via email to