Hi all,
I've got some puzzling issue here. During tests i noticed a document at the
bottom of the results where it should not be. I query using DisMax on title
and content field and have a boost on title using qf. Out of 30 results, only
two documents also have the term in the title.
Using debugQuery and fl=*,score i quickly noticed large negative maxScore of
the complete resultset and a portion of the resultset where scores sum up to
zero because of a product with 0 (fieldNorm).
See below for debug output for a result with score = 0:
0.0 = (MATCH) sum of:
0.0 = (MATCH) max of:
0.0 = (MATCH) weight(content:kunstgrasveld in 7), product of:
0.75658196 = queryWeight(content:kunstgrasveld), product of:
6.6516633 = idf(docFreq=33, maxDocs=9682)
0.113743275 = queryNorm
0.0 = (MATCH) fieldWeight(content:kunstgrasveld in 7), product of:
2.236068 = tf(termFreq(content:kunstgrasveld)=5)
6.6516633 = idf(docFreq=33, maxDocs=9682)
0.0 = fieldNorm(field=content, doc=7)
0.0 = (MATCH) fieldWeight(title:kunstgrasveld in 7), product of:
1.0 = tf(termFreq(title:kunstgrasveld)=1)
8.791729 = idf(docFreq=3, maxDocs=9682)
0.0 = fieldNorm(field=title, doc=7)
And one with a negative score:
3.0716116E-4 = (MATCH) sum of:
3.0716116E-4 = (MATCH) max of:
3.0716116E-4 = (MATCH) weight(content:kunstgrasveld in 1462), product of:
0.75658196 = queryWeight(content:kunstgrasveld), product of:
6.6516633 = idf(docFreq=33, maxDocs=9682)
0.113743275 = queryNorm
4.059853E-4 = (MATCH) fieldWeight(content:kunstgrasveld in 1462), product
of:
1.0 = tf(termFreq(content:kunstgrasveld)=1)
6.6516633 = idf(docFreq=33, maxDocs=9682)
6.1035156E-5 = fieldNorm(field=content, doc=1462)
There are no funky issues with term analysis for the text fieldType, in fact,
the term passes through unchanged. I don't do omitNorms, i store termVectors
etc.
Because fieldNorm = fieldBoost / sqrt(numTermsForField) i suspect my input from
Nutch is messed up. A fieldNorm can never be =< 0 for a normal positive boost
and field boosts should not be zero or negative (correct me if i'm wrong). But,
since i can't yet figure out what field boosts Nutch sends to me i thought i'd
drop by on this mailing list first.
There are quite a few query terms that return with zero or negative scores and
many that behave as i expect. I find it also a bit hard to comprehend why the
docs with negative score rank higher in the result set than documents with
zero score. Sorting defaults to score DESC, but this is perhaps another
issue.
Anyway, the test runs on a Solr 1.4.1 instance with Java 6 under the hood.
Help or directions are appreciated =)
Cheers,
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350