I think so. When adding this statistic (lucene 4.0), personally I
really wanted to fix it everywhere. But we had the problem of
backwards compatibility, and its bad to use different formulas for
different segments even if it works...

Nowadays we dont have lucene 3 segments around anymore, so I think we
should fix this. Want to open an issue?

On Wed, Jul 29, 2015 at 10:45 AM, Ahmet Arslan
<[email protected]> wrote:
> Hello List,
>
> SimilarityBase uses CollectionStatistics#maxDoc() for numberOfDocuments.
> Shouldn't it be field-based CollectionStatistics#docCount()?
>
> --- core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java   
>   (revision 1693268)
> +++ core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java   
>   (working copy)
> @@ -102,7 +102,7 @@
> protected void fillBasicStats(BasicStats stats, CollectionStatistics 
> collectionStats, TermStatistics termStats) {
> // #positions(field) must be >= #positions(term)
> assert collectionStats.sumTotalTermFreq() == -1 || 
> collectionStats.sumTotalTermFreq() >= termStats.totalTermFreq();
> -    long numberOfDocuments = collectionStats.maxDoc();
> +    long numberOfDocuments = collectionStats.docCount();
>
>
> Thanks,
> Ahmet
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to