[
https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler reopened LUCENE-3290:
-----------------------------------
I reopen this one:
bq. The FieldInvertState.numUniqueTerms portion is backported to 3.x (no
collection level stats are in 3.x in general, seems tricky)
As we backported this, we must add a Lucene 3.4 backwards index to the
TestBackwardsCompatibility test. And hopefully this new 3.4 Index format opens
sucessfully in trunk!
> add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
> -----------------------------------------------------
>
> Key: LUCENE-3290
> URL: https://issues.apache.org/jira/browse/LUCENE-3290
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3290.patch, LUCENE-3290.patch
>
>
> For scoring systems like lnu.ltc
> (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to
> supply 3 stats:
> * average tf within d
> * # of unique terms within d
> * average number of unique terms across field
> If we add FieldInvertState.numUniqueTerms, you can incorporate the first two
> into your norms/docvalues (once we cut over),
> the average tf within d being length / numUniqueTerms.
> to compute the average across the field, we can just write the sum of all
> terms' docfreqs into the terms dictionary header,
> and you can then divide this by maxdoc to get the average.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]