[
https://issues.apache.org/jira/browse/LUCENE-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413484#comment-16413484
]
Dawid Weiss commented on LUCENE-8221:
-------------------------------------
True, although from a logical standpoint I think numDocs makes more sense than
maxDoc -- I'd typically want those thresholds calculated based on the actual
number of documents in the index at any given moment, without regard to how
deleted documents are represented. It's the docFreq that should be altered/
catered for (in the future), not numDocs?
> MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes
> -----------------------------------------------------------------------
>
> Key: LUCENE-8221
> URL: https://issues.apache.org/jira/browse/LUCENE-8221
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Attachments: LUCENE-8221.patch
>
>
> {code}
> public void setMaxDocFreqPct(int maxPercentage) {
> this.maxDocFreq = maxPercentage * ir.numDocs() / 100;
> }
> {code}
> The above overflows integer range into negative numbers on even fairly small
> indexes (for maxPercentage = 75, it happens for just over 28 million
> documents.
> We should make the computations on long range so that it doesn't overflow and
> have a more strict argument validation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]