On 11/7/06, Antony Bowesman <[EMAIL PROTECTED]> wrote:
I've been trying to understand how idf is arrived at from a query. I have a
single Document with 9 fields. One field "subject" has the phrase "RFC2822 -
Internet Message Format" and a second "body" has the contents of rfc2822.
The other fields contain additional meta data. If I search for subject:message
I get the following explanation.
0.15342641 = fieldWeight(subject:message in 0), product of:
1.0 = tf(termFreq(subject:message)=1)
0.30685282 = idf(docFreq=1)
0.5 = fieldNorm(field=subject, doc=0)
why does the idf get that value?
idf is dependent only on the corpus, not on the individual document.
The formula is here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html
1+log(1/2) = 0.30685282
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]