On Mar 24, 2010, at 9:20 AM, Shashi Kant wrote: > Add the common terms such as "University", "School", "Medicine", > "Institute" etc. to stopwords list, so you are left with Stanford, > "Palo Alto" etc.
I don't know if I would remove them, but you might consider using the CommonGram or n-gram approach whereby you associate these "stop words" with the words around them. > > Then use Ahmet's suggestion of using a booleanquery > .setMinimumNumberShouldMatch() to (say) 75% of the query string > length. > > Finally, if you wish to be very precise, you can loop through the hits > collector and use a string comparison algorithm like Jaro-Winkler, > Levenstein etc. for a second-level filter. Note, this approach will be slow. -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org