It seems that I've had the wrong idea the entire time. Thanks for the help.
On Fri, Sep 6, 2013 at 3:45 PM, Suneel Marthi <[email protected]>wrote: > seq2sparse uses Lucene Standard tokenization to generate the tfidf > vectors. But since your data is in CSV format (from the example u had > provided below) you should be using Mahout's CSVVectorIterator to generate > the vectors. > > See > http://stackoverflow.com/questions/13663567/mahout-csv-to-vector-and-running-the-program > > Once you have generated the term vectors you also need to specify -cl > option to the kmeans CLI to generate the clusters. > Also you don't have to generate the centroids upfront (unless its > something specific you ur use case), kmeans would generate random k > centroids during execution. > > > > > > ________________________________ > From: P Kal <[email protected]> > To: [email protected] > Sent: Friday, September 6, 2013 2:05 PM > Subject: Kmeans - clustering help > > > I'm trying to a kmeans clustering on only numeric data > > This is how my data looks > header1, header2 header3, header4, header5 > 0,0,0,0,0 > 1,3,2,4,5 > 3,2,4,5,6 > . > . > . > > about 3000 rows > > As the cluster centroids I created another file > (0,0,0,0,0) > (1,2,3,4,5) > > My understanding is that we'd have to change these text files to sequence > files and then generate sparse vectors from this sequence file for kmeans > clustering > > I've used the seqdirectory followed by seq2sparse, > and at the end I have two folders, one for input and one for centroids > > Input folder has dirs generated by seq2sparse on the input sequence file > Similarly the centroids folder has dirs generated by seq2sparse on the > centroids sequence file > The command I use to run kmeans > > mahout kmeans --input input/tfidf-vectors --output output -c > centroids/tfidf-vectors --maxIter 20 > and I get this error > > No input clusters found in centroids/tfidf-vectors Check your -c argument. > > The sequence files have data but the files generated by seq2sparse do not > have any contents. > Can someone please help. > > BTW all this on hdfs and not local mode >
