Re: Score exact matches higher than matches that match analysed text but not original text

Paul Taylor Tue, 10 Jan 2012 04:49:15 -0800

On 10/01/2012 10:18, Ian Lea wrote:

If a term has an accent, add both accented and unaccented versions at
index and search time.


So in your example your default field would contain

República Republica

and a search for "República" would expand to "República Republica" and
match both and score higher than a search for "Republica" which would
just match the unaccented version.

Thanks, that is a solution, but a side effect would be that ifsearching for Republica , a document containing 'Banana Republica' wouldscore as well as "República" (because República expands to "RepublicaRepública)) as in both cases the search term would match one of twoterms, whereas I would want it to score República higher.

I don't really want to mess with the matching I'm happy with what itmatches and the order the results are returned in, but the trouble isbecause we are only searching short amounts of text not large chunks oftext we typically end up with many matches having the same score and Iwould like to just improve the scoring aspect so that matches thatappear better to the user are higher up in the search.


República

It's not quite synonyms but you could borrow synonym code from
somewhere.  There's stuff in the lucene contrib area and in LIA and
maybe elsewhere.  I've used the LIA code to do something similar.


An alternative would be to store accented versions in a separate field
and add a query for that field to the mix if you have accented terms.
You could boost that part of the query.

Also, the accent case was the easiest to explain but I also want toapply this in different cases such as misspellings. i.e if there are twodocuments in the index with the value


James Clarke
David Clarke

And I search for

Dave Clarke

I would like David Clarke to score higher than James Clarke, because thefirst name is nearly the same but at the moment they both score the samebecause just match on second value.I dont want to introduce synonms or wildcard searches because I think itwill return far too many false positives, and also search is notrestricted to latin charsets. But having done a search that returns

both

James Clarke
David Clarke

I can then safetly adjust the scores, maybe I should just try myoriginal idea.



Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Score exact matches higher than matches that match analysed text but not original text

Reply via email to