Well, I expect that defining "less common" is tricky and doesn't lend itself to a canned answer <G>. Would it work to create your own list of stop words (possibly very large) to use for indexing and/or searching? This would simply exclude the "less common" words (as you define them). StandardAnalyzer, for instance, can take a File of stop words in one of its constructors.......
Erick On 8/7/06, Chad Hardin <[EMAIL PROTECTED]> wrote:
hi all, I'm new to lucene but I'm loving it! I'm writing a prototype that links documents together based upon similarities. Obviously the first thing I did was use MoreLikeThis. However, it seems to be finding matches based upon words that are too common, in this case the words "from" and "can" and seems to be missing matches using the terms I would expect (in this case documents about "bikes"). I seems I need a more custom tailored Filter that only passes through more less-common words. Does something like this already exist? Thanks, Chad --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]