Re: Lucene Approximation

2020-06-02 Thread Michael Sokolov
Sorry, I thought that you wanted to maintain the true value rather than the approximated value. I am not entirely sure, but I think the approximation arises due to rounding and low-precision storage of these values in the index. You might be able to reverse engineer it by looking at "Norms," which

Re: Lucene Approximation

2020-06-02 Thread moritz
Thank you for your answer, but please could you explain this idea in detail as I cannot see how this would help solving my problem? For example, I got the indexed Wikipedia Article "Alan Smithee" with a document length of 756, which also is used when calculating the average document length. Bu

Re: Lucene Approximation

2020-06-02 Thread Michael Sokolov
You could append an EOF token to every indexed text, and then iterate over Terms to get the positions of those tokens? On Tue, Jun 2, 2020 at 11:50 AM Moritz Staudinger wrote: > > Hello, > > I am not sure if I am at the right place here, but I got a question about > the approximation my Lucene im

Lucene Approximation

2020-06-02 Thread Moritz Staudinger
Hello, I am not sure if I am at the right place here, but I got a question about the approximation my Lucene implementation does. I am trying to calculate the same scores Lucenes BM25Similiarity calculates, but I found out that Lucene only approximates the length of documents for scoring but us