Re: Lucene query with long strings

2010-03-24 Thread Grant Ingersoll
On Mar 24, 2010, at 9:20 AM, Shashi Kant wrote: > Add the common terms such as "University", "School", "Medicine", > "Institute" etc. to stopwords list, so you are left with Stanford, > "Palo Alto" etc. I don't know if I would remove them, but you might consider using the CommonGram or n-gram a

Re: Lucene query with long strings

2010-03-24 Thread Shashi Kant
Add the common terms such as "University", "School", "Medicine", "Institute" etc. to stopwords list, so you are left with Stanford, "Palo Alto" etc. Then use Ahmet's suggestion of using a booleanquery .setMinimumNumberShouldMatch() to (say) 75% of the query string length. Finally, if you wish to

RE: Lucene query with long strings

2010-03-23 Thread Steven A Rowe
Hi Aaron, Your "false positives" comments point to a mismatch between what you're currently asking Lucene for (any document matching any one of the terms in the query) and what you want (only fully "correct" matches). You need to identify the terms of the query that MUST match and tell Lucene

Re: Lucene query with long strings

2010-03-23 Thread Ahmet Arslan
> hi all, I have been playing > with Lucene for a while now, but stuck on a perplexing > issue. > > I have an index, with a field "Affiliation", some example > values are: > > - "Stanford University School of Medicine, Palo Alto, CA > USA", > - "Institute of Neurobiology, School of Medicine, Sta