Re: More like this returning similarities that are too generic

2006-08-08 Thread Chad Hardin
You're soo right! I'm totally new to lucene (and text analyses, searching etc), but now that you showed me I "get it". Thank you so much for your reply. Chad On Aug 8, 2006, at 12:45 AM, Chris Hostetter wrote: I've never used MoreLikeThis myself, but based on how i know it works, your

Re: More like this returning similarities that are too generic

2006-08-08 Thread Chris Hostetter
I've never used MoreLikeThis myself, but based on how i know it works, your problem probably has more to do with the size of your test corpus and th frequency of the words in your docs then by the size of the docs themselves. : There's still the issue of the queries from MoreLikeThis not : return

Re: More like this returning similarities that are too generic

2006-08-07 Thread Chad Hardin
Thank you Erick, that was what I anticipated would be necessary. There's still the issue of the queries from MoreLikeThis not returning results for terms I had expected ("bikes"). For example, I have these four very short documents: "bikes are a handy tool for getting from diffrent locations

Re: More like this returning similarities that are too generic

2006-08-07 Thread Erick Erickson
Well, I expect that defining "less common" is tricky and doesn't lend itself to a canned answer . Would it work to create your own list of stop words (possibly very large) to use for indexing and/or searching? This would simply exclude the "less common" words (as you define them). StandardAnalyzer

More like this returning similarities that are too generic

2006-08-07 Thread Chad Hardin
hi all, I'm new to lucene but I'm loving it! I'm writing a prototype that links documents together based upon similarities. Obviously the first thing I did was use MoreLikeThis. However, it seems to be finding matches based upon words that are too common, in this case the words "from"