Canopy is deprecated. Depends on what you want to do, are you trying to find the closest documents to some set that you have hand classified?
All Canopy does is seed centroids for kmeans to start with. Kmeans then iterates towards “better" ones. You can do the same with your docs but they won’t remain the centroids after iteration. On Nov 17, 2014, at 2:11 PM, Sean Farrell <[email protected]> wrote: Hi Donni, I believe that the canopy clustering algorithm will do what you want, though I haven't played around with it myself yet. The clustering chapter in the 'Mahout in Action' book covers this fairly well. Cheers, Sean *----------------------Dr Sean FarrellData Scientist* On Tue, Nov 18, 2014 at 12:01 AM, Donni Khan <[email protected]> wrote: > Hi All, > > I'm working with text clustering. I want to select specific documents(as a > vectors) to be centroIDs fo k-means. > I have created the TF-IDF for my dataset by using Mahout, and I would like > to choose the initioal clusters from TFIDF vectors. > > Anyone has an idea Hw I can do it by Mahout? > > Many thanks in advance. > Donni >
