Re: easy way to figure out most common tokens?

Shaya Potter Wed, 15 Aug 2012 11:43:21 -0700

On 08/15/2012 02:29 PM, Erick Erickson wrote:

I don't see how you could without indexing everything first
since you can't know what the most frequent terms until
you've processed all your documents....


exactly

If you know these terms in advance, it seems like you could
just call then stopwords and use the common stopword
processing.

If you have to examine your corpus in the first place,
it seems like you could do something with term
frequencies to extract the most common terms from
your index then re-index all your data with those terms
as stopwords..

its a possibility, but that would require reindexing, which would take along time, hence my desire to try and edit the individual documents.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: easy way to figure out most common tokens?

Reply via email to