The feedback came directly from customers and customer facing support folks. Here is an example of a query with keywords: nurse, rn, nursing, hospital. The top 2 hits have scores of 26.86348 and 26.407215. To the customer, both results were equally relevant because all of their keywords were in the documents. For this application, the subtleties of TF/IDF are not appreciated by the end user ;-). Here are the Explanations for the scores (I hope they are readable):
Doc 1: 26.86348 sum of: 26.86348 product of: 33.57935 sum of: 10.403484 weight(contents:nurse in 110320), product of: 0.30413723 queryWeight(contents:nurse), product of: 4.8375363 idf(contents: nurse=9554) 0.06287027 queryNorm 34.206547 fieldWeight(contents:nurse in 110320), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=2.0) 5.0 scorePayload(...) 4.8375363 idf(contents: nurse=9554) 1.0 fieldNorm(field=contents, doc=110320) 11.005695 weight(contents:rn in 110320), product of: 0.31281596 queryWeight(contents:rn), product of: 4.9755783 idf(contents: rn=8322) 0.06287027 queryNorm 35.18265 fieldWeight(contents:rn in 110320), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=3.0) 5.0 scorePayload(...) 4.9755783 idf(contents: rn=8322) 1.0 fieldNorm(field=contents, doc=110320) 10.136917 weight(contents:nursing in 110320), product of: 0.3002155 queryWeight(contents:nursing), product of: 4.7751584 idf(contents: nursing=10169) 0.06287027 queryNorm 33.76547 fieldWeight(contents:nursing in 110320), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=11.0) 5.0 scorePayload(...) 4.7751584 idf(contents: nursing=10169) 1.0 fieldNorm(field=contents, doc=110320) 2.0332527 weight(contents:hospital in 110320), product of: 0.30064976 queryWeight(contents:hospital), product of: 4.7820654 idf(contents: hospital=10099) 0.06287027 queryNorm 6.7628617 fieldWeight(contents:hospital in 110320), product of: 1.4142135 btq, product of: 1.4142135 tf(phraseFreq=3.0) 1.0 scorePayload(...) 4.7820654 idf(contents: hospital=10099) 1.0 fieldNorm(field=contents, doc=110320) 0.8 coord(4/5) Doc 2: 26.407215 sum of: 26.407215 product of: 33.009018 sum of: 10.403484 weight(contents:nurse in 271166), product of: 0.30413723 queryWeight(contents:nurse), product of: 4.8375363 idf(contents: nurse=9554) 0.06287027 queryNorm 34.206547 fieldWeight(contents:nurse in 271166), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=4.0) 5.0 scorePayload(...) 4.8375363 idf(contents: nurse=9554) 1.0 fieldNorm(field=contents, doc=271166) 11.005695 weight(contents:rn in 271166), product of: 0.31281596 queryWeight(contents:rn), product of: 4.9755783 idf(contents: rn=8322) 0.06287027 queryNorm 35.18265 fieldWeight(contents:rn in 271166), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=4.0) 5.0 scorePayload(...) 4.9755783 idf(contents: rn=8322) 1.0 fieldNorm(field=contents, doc=271166) 1.4335766 weight(contents:nursing in 271166), product of: 0.3002155 queryWeight(contents:nursing), product of: 4.7751584 idf(contents: nursing=10169) 0.06287027 queryNorm 4.7751584 fieldWeight(contents:nursing in 271166), product of: 1.0 btq, product of: 1.0 tf(phraseFreq=1.0) 1.0 scorePayload(...) 4.7751584 idf(contents: nursing=10169) 1.0 fieldNorm(field=contents, doc=271166) 10.166264 weight(contents:hospital in 271166), product of: 0.30064976 queryWeight(contents:hospital), product of: 4.7820654 idf(contents: hospital=10099) 0.06287027 queryNorm 33.81431 fieldWeight(contents:hospital in 271166), product of: 7.071068 btq, product of: 1.4142135 tf(phraseFreq=9.0) 5.0 scorePayload(...) 4.7820654 idf(contents: hospital=10099) 1.0 fieldNorm(field=contents, doc=271166) 0.8 coord(4/5) Peter On Wed, May 5, 2010 at 10:10 AM, Grant Ingersoll <gsing...@apache.org>wrote: > Thanks, Peter. > > Can you share what kind of evaluations you did to determine that the end > user believed the results were equally relevant? How formal was that > process? > > -Grant > > On May 3, 2010, at 11:08 AM, Peter Keegan wrote: > > > We discovered very soon after going to production that Lucene's scores > were > > often 'too precise'. For example, a page of 25 results may have several > > different score values, and all within 15% of each other, but to the end > > user all 25 results were equally relevant. Thus we wanted the secondary > sort > > field to determine the order, instead. This required writing a custom > score > > comparator to 'round' the scores. The same thing occurred for distance > > sorting. We also limit the effect of term frequency to help prevent > > spamming. In comparison to Avi, we use 'AND' as the default operator for > > keyword queries and if no docs are found, the query is automatically > retried > > with 'OR'. This improves precision a bit and only occurs if the user > > provides no operators. > > > > Lucene's Explanation class has been invaluable in helping me to explain a > > particular sort order in many, many situations. > > Most of our relevance tuning has occurred after deployment to production. > > > > Peter > > > > On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <gsing...@apache.org > >wrote: > > > >> I'm putting on a talk at Lucene Eurocon ( > >> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical > >> Relevance" and I'm curious as to what people put in practice for testing > and > >> improving relevance. I have my own inclinations, but I don't want to > muddy > >> the water just yet. So, if you have a few moments, I'd love to hear > >> responses to the following questions. > >> > >> What worked? > >> What didn't work? > >> What didn't you understand about it? > >> What tools did you use? > >> What tools did you wish you had either for debugging relevance or > "fixing" > >> it? > >> How much time did you spend on it? > >> How did you avoid over/under tuning? > >> What stage of development/testing/production did you decide to do > relevance > >> tuning? Was that timing planned or not? > >> > >> > >> Thanks, > >> Grant > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >