Grant Ingersoll wrote:
First off, I would start by using Lucene's explain functionality to see why one result appears before the other. The explain method will tell you all the factors that go into scoring each of your results, as it goes beyond just term frequency.

Finally, you might find http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Search helpful. One of the things I often tell people is that if you know a certain result needs to be in a certain place for a certain query, just put it there. Otherwise, don't sweat relative position too much unless you have a result that you think is good buried (i.e. page 5) deep down in your results.

To elaborate on what Grant hinted at ... If the top-N results are good enough, but you are concerned about their ordering, a trick that I often find useful is to simply implement an arbitrary re-sorting of top-N results, according to your rules of preference (business rules, or heuristics). This way you can avoid the overfitting or doing endless tweaking, and still get the ranking that makes sense to your users.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to