RE: Obtaining IDF values for the terms in a document set

2011-12-15 Thread Burton-West, Tom
Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search -Original Message- From: Mike O'Leary [mailto:tmole...@uw.edu] Sent: Thursday, December 15, 2011 12:34 PM To: java-user@lucene.apache.org Subject: Obtaining IDF values for the terms in a document set We have a large s

RE: Obtaining IDF values for the terms in a document set

2011-12-15 Thread Mike O'Leary
all of the terms that occur in the document set and obtain their IDF values. Thanks, Mike -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Thursday, December 15, 2011 11:44 AM To: java-user@lucene.apache.org Subject: Re: Obtaining IDF values for the

Re: Obtaining IDF values for the terms in a document set

2011-12-15 Thread Simon Willnauer
On Thu, Dec 15, 2011 at 6:33 PM, Mike O'Leary wrote: > We have a large set of documents that we would like to index with a > customized stopword list. We have run tests by indexing a random set of about > 10% of the documents, and we'd like to generate a list of the terms in that > smaller set

Obtaining IDF values for the terms in a document set

2011-12-15 Thread Mike O'Leary
We have a large set of documents that we would like to index with a customized stopword list. We have run tests by indexing a random set of about 10% of the documents, and we'd like to generate a list of the terms in that smaller set and their IDF values as a way to create a starter set of stopw