Re: How to choose the intioal clusters for K-mean from Tf-IDF vectors

Pat Ferrel Mon, 17 Nov 2014 15:55:44 -0800

Canopy is deprecated. 

Depends on what you want to do, are you trying to find the closest documents to 
some set that you have hand classified?

All Canopy does is seed centroids for kmeans to start with. Kmeans then 
iterates towards “better" ones. You can do the same with your docs but they 
won’t remain the centroids after iteration.

On Nov 17, 2014, at 2:11 PM, Sean Farrell <[email protected]> wrote:

Hi Donni,

I believe that the canopy clustering algorithm will do what you want,
though I haven't played around with it myself yet. The clustering chapter
in the 'Mahout in Action' book covers this fairly well.

Cheers,

Sean

*----------------------Dr Sean FarrellData Scientist*

On Tue, Nov 18, 2014 at 12:01 AM, Donni Khan <[email protected]>
wrote:

> Hi All,
> 
> I'm working with text clustering. I want to select specific documents(as a
> vectors) to be centroIDs fo k-means.
> I have created the TF-IDF for my dataset by using Mahout, and I would like
> to choose the initioal clusters from TFIDF vectors.
> 
> Anyone has an idea Hw I can do it by Mahout?
> 
> Many thanks in advance.
> Donni
>

Re: How to choose the intioal clusters for K-mean from Tf-IDF vectors

Reply via email to