hi, i used lucene-2.4.0 to get tf-idf. i'm not sure if the newer versions have direct methods to get tf-idfs as well. this is lengthy but might help.
// Get Term Enum that contains all the terms in the index using FilterIndexReader TermEnum e = freader.terms(); // find total number of docs int noOfDocs = freader.numDocs(); // Get TermDocs Enum TermDocs td = freader.termDocs(); // seeking through all the terms while(e.next()){ // get the term Term t = e.term(); // only search the contents field Term term = new Term("contents",t.text()); // Move to the <document, frequency> pairs for term from the TermDocs Enum containing the term t td.seek(term); // loop through each document containing the term. while(td.next()) { double weight = td.freq()*Math.log(noOfDocs/e.docFreq()); // do something with the weight //...... } } regards, Dipesh On Mon, Feb 2, 2009 at 2:06 AM, Rehan Abdulaziz <abdulazizre...@gmail.com>wrote: > Hi, > Is it possible to retrieve the tfidf weights (or relative weight calculated > from any formula) instead of simple term frequencies through > getTermFreqVector()? I am more interested in knowing the relative weight of > each term in each field rather than just the frequency of terms. > > Thank you very much > > -- > Rehan Abdul Aziz > Software Engineer > Scrybe Inc. (www.iscrybe.com) > Cell: +92-344-5514583 > -- ---------------------------------------- "Help Ever Hurt Never"- Baba