My analyser strips out accents as often these are not entered correctly, so assume there are two documents in the database with default field containing
República
Republica

a search for República or Republica will return both results, each with a score of 1.

Its correct that they both get returned but it would be really nice if at the scoring stage it could recognise that if I had search for República that the document containing República is a slightly better match than the other one and score slightly higher, and vice versa.

Is there are any way to do this in Lucene, alternatively I thought about augmenting the score results returned by Lucene, and when multiple results have the same score check the number of matching letters and increase the score based on how many letters match, but only increase the score so still lower than any results that Lucene scored higher. I also realise that this seems to make sense when just searching one field but more complex when the query is searching over multiple fields but I think in this case when searching for artists/bands (music) I would only do the boost if the artist name was one of the search fields.

Paul

Reply via email to