On 08/15/2012 02:29 PM, Erick Erickson wrote:
I don't see how you could without indexing everything first since you can't know what the most frequent terms until you've processed all your documents....
exactly
If you know these terms in advance, it seems like you could just call then stopwords and use the common stopword processing. If you have to examine your corpus in the first place, it seems like you could do something with term frequencies to extract the most common terms from your index then re-index all your data with those terms as stopwords..
its a possibility, but that would require reindexing, which would take a long time, hence my desire to try and edit the individual documents.
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org