Hi Is this feature already unsupported in Mahout-0.7? Some topics tells that this has moved to WeightedVectorWritables, however still not sure how I can pull out labels from clusters other than top terms from ClusterDumper....
K-Means clustering from vectors created from Lucene index (vi Mahout Lucene.Vectors) itself went well. I have intentionally giving minClusterSize as "1" since this exercise uses only 37 documents and 20 clusters have been generated. [hadoop@localhost mahout-distribution-0.7]$ $MAHOUT_HOME/bin/mahout org.apache.mahout.utils.vectors.lucene.ClusterLabels --dir /home/hadoop/lia2e/indexes/MeetLucene/ --field contents --idField id --seqFileDir JAText-kmeans-clusters08/clusters-2-final --pointsDir JAText-kmeans-clusters08/clusteredPoints --minClusterSize 1 --maxLabels 5 Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.7-job.jar 13/03/03 21:54:59 WARN driver.MahoutDriver: No org.apache.mahout.utils.vectors.lucene.ClusterLabels.props found on classpath, will use command-line arguments only 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 0 with size: 2 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 2 with size: 2 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 5 with size: 5 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 6 with size: 2 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 9 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 10 with size: 2 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 13 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 15 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 18 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 19 with size: 3 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 20 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 23 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 24 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 29 with size: 2 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 30 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 31 with size: 2 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 32 with size: 2 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 34 with size: 4 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 35 with size: 1 13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 36 with size: 2 13/03/03 21:55:00 INFO driver.MahoutDriver: Program took 911 ms (Minutes: 0.015183333333333333) [hadoop@localhost mahout-distribution-0.7]$ This is my combinational exercise from Taming Text with Lucene in Action. Regards,,, Y.Mandai
