On Mar 24, 2010, at 9:20 AM, Shashi Kant wrote:

> Add the common terms such as "University", "School", "Medicine",
> "Institute" etc. to stopwords list, so you are left with Stanford,
> "Palo Alto" etc.

I don't know if I would remove them, but you might consider using the 
CommonGram or n-gram approach whereby you associate these "stop words" with the 
words around them.

> 
> Then use Ahmet's suggestion of using a booleanquery
> .setMinimumNumberShouldMatch() to (say) 75% of the query string
> length.
> 
> Finally, if you wish to be very precise, you can loop through the hits
> collector and use a string comparison algorithm like Jaro-Winkler,
> Levenstein etc. for a second-level filter.

Note, this approach will be slow.




--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to