Suneel, I ended up using seqdumper on the docIndex file to retrieve the mapping of rowid -> text key. This brought me a lot closer than where I was before!
However, now I have three files (contents here: https://gist.github.com/momer/11289002) My first thought is to write my own map/reduce jobs to get a dataset (key/value) which has the format: original_text_key => [term1, term2, term3, term4] Where the terms are selected from the topic which has the highest probability of describing the document. An example: "com.soryy:http/ruby/api/rails/authentication/2014/03/16/apis-with-devise.html" => ["end","token","authentication","api","resource","def","devise","json","user","x"] Is there built-in functionality to do this, or is my plan of running another 1 or 2 map-reduce jobs the way to go? Thank you for your help again, Mo On Thu, Apr 24, 2014 at 6:52 PM, Suneel Marthi <[email protected]> wrote: > RowId creates a matrix and docIndex which r <IntWritable, vectorWritable> > and <IntWritable, Text> respectively. > > Have u looked at LDAPrintTopics.java ? > > > On Thu, Apr 24, 2014 at 7:32 PM, Mohammed Omer > <[email protected]>wrote: > >> Good evening all. >> >> This is my first time working with Mahout, and I'm really excited about >> being able to stand on the shoulders of giants, thanks to your hard work >> on >> the project. >> >> I'm 90% of the way there with my current Mahout project, but that last 10% >> is killing me. >> >> Code is at https://github.com/momer/mahout_difficulties if you want to >> skip >> my explanation and go right to the commands I ran, etc. >> >> Using a Lucene index and Mahout's robust CLI, I was able to generate >> sequence files; sparse vectors; convert those vector keys to integers; and >> as a result, run the CVB/LDA Algorithm. >> >> This worked great, and I was able to dump out the p(doc|topic) and >> p(topic|term) results; but, I'm having a tough time figuring out how to >> use >> the matrix generated by `mahout rowid` to map the documents and their >> respective topic-assignments/probabilities back to their original text >> vector keys. >> >> Though I'm typically a Rubyist, and having recently (last weekend) >> read/worked through the entirety of Core Java vol 1, I'm pretty >> comfortable >> with Java. I am falling on my face at this last step, though. >> >> I appreciate the eyes and help! >> >> Thank you again, >> >> Mo >> > >
