[ 
https://issues.apache.org/jira/browse/LUCENE-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303854#comment-16303854
 ] 

Adrien Grand commented on LUCENE-8087:
--------------------------------------

I started looking into it but I'm a bit unhappy that this would have a 
significant impact on the terms dictionary in terms of API and size, while 
still being subject to poisonous documents that could bump the value of the 
maximum score for an entire segment.

> Record per-term max term frequencies
> ------------------------------------
>
>                 Key: LUCENE-8087
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8087
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8087.patch
>
>
> I was mostly interested in doing that in order to get better score upper 
> bounds for LUCENE-4100. However this doesn't help, at least with the tasks 
> that we have for wikimedium10m. I dug this a bit, and this is due to the fact 
> that the upper bound is not much better if we can't make assumptions about 
> the value of the length. Ideally we'd need something like the maximum term 
> frequency for each norm value. I'll post the patch in case someone has 
> another use-case for per-term max term frequencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to