Michael. 1. build-reuters.sh is to be be retired, use cluster-reuters.sh instead. 2. You are correct, the script does what's been described in the wiki link.
________________________________ From: Michael Wechner <[email protected]> To: [email protected] Sent: Tuesday, September 10, 2013 9:55 AM Subject: Question re cluster-reuters.sh Hi I have tried to follow/execute the steps described at https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line but had trouble to do so, because for example |org.apache.lucene.benchmark.utils.ExtractReuters does not seem to beĀ contained by mahout-distribution-0.8. Then I have found mahout-distribution-0.8/examples/bin/build-reuters.sh and was running it successfully. If I understand the script is doing basically the same as described at |https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line right? If so, then I would be happy to update the Wiki accordingly. Also I have found the following post http://bickson.blogspot.ch/2011/09/understanding-mahout-k-means-clustering.html which seems to describe quite nicely what is happening behind the scenes. I think it would also make sense to add these notes to the Wiki, or WDYT? Thanks Michael
