add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
-----------------------------------------------------
Key: LUCENE-3290
URL: https://issues.apache.org/jira/browse/LUCENE-3290
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 4.0
For scoring systems like lnu.ltc
(http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to
supply 3 stats:
* average tf within d
* # of unique terms within d
* average number of unique terms across field
If we add FieldInvertState.numUniqueTerms, you can incorporate the first two
into your norms/docvalues (once we cut over),
the average tf within d being length / numUniqueTerms.
to compute the average across the field, we can just write the sum of all
terms' docfreqs into the terms dictionary header,
and you can then divide this by maxdoc to get the average.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]