Obtaining IDF values for the terms in a document set

Mike O'Leary Thu, 15 Dec 2011 09:34:47 -0800

We have a large set of documents that we would like to index with a customized 
stopword list. We have run tests by indexing a random set of about 10% of the 
documents, and we'd like to generate a list of the terms in that smaller set 
and their IDF values as a way to create a starter set of stopwords for the 
larger document set by selecting the terms that have the lowest IDF values. 
First of all, is this the best way to create a stopword list? Second, is there 
a straightforward way to generate a list of terms and their IDF values from a 
Lucene index?
Thanks,
Mike

Obtaining IDF values for the terms in a document set

Reply via email to