Re: Importing tfidf from training set

Andrew Palumbo Tue, 17 Mar 2015 08:08:51 -0700

If you vectorized your training data with seq2sparse, you'll need to usethe df-count and dictionary from the training set. You can thentokenize a new document with a lucene analyzer and count the termfrequencies for all terms in the dictionary. You can then use theTFIDF class:


https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/vectorizer/TFIDF.java

with the corresponding df-count for each term from the training set forthe TF-IDF transformation.




On 03/17/2015 04:46 AM, mw wrote:

Hello,

i am running lda on a training set to create a topic model.
For calculating p(topic|document) on unseen data i need to import theinverse document frequency from the training set.
Is there a way to do that in mahout?

Best,
Max

Re: Importing tfidf from training set

Reply via email to