Grant Ingersoll wrote:
First off, I would start by using Lucene's explain functionality to see
why one result appears before the other. The explain method will tell
you all the factors that go into scoring each of your results, as it
goes beyond just term frequency.
Finally, you might find
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Search helpful.
One of the things I often tell people is that if you know a certain
result needs to be in a certain place for a certain query, just put it
there. Otherwise, don't sweat relative position too much unless you
have a result that you think is good buried (i.e. page 5) deep down in
your results.
To elaborate on what Grant hinted at ... If the top-N results are good
enough, but you are concerned about their ordering, a trick that I often
find useful is to simply implement an arbitrary re-sorting of top-N
results, according to your rules of preference (business rules, or
heuristics). This way you can avoid the overfitting or doing endless
tweaking, and still get the ranking that makes sense to your users.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org