RE: BM25 Scoring Patch

Yuval Feinstein Thu, 18 Feb 2010 08:21:07 -0800

-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Thursday, February 18, 2010 3:09 PM
To: java-user@lucene.apache.org
Subject: Re: BM25 Scoring Patch


Yuval, don't we still need this 'document-level IDF' for BM25f?

- Yes, we do need 'document-level IDF' for BM25f.  
- Joaquin tried to bypass this by using the IDF of the field having the longest 
average length instead
- of the document's IDF.
- This introduces some bias into the scoring formula, but maybe it is not too 
large...

On Thu, Feb 18, 2010 at 3:45 AM, Yuval Feinstein <yuv...@answers.com> wrote:

> We could solve this by saying we only incorporate BM25F into Lucene.
> This is a field-based scoring method, so it saves us the need to deal with
> documents.
> Building on Joaquin's work, the extra parts needed IMO are:
> a. Support for storing average length per field during indexing. I think I
> saw some reference to this
> when Grant described the new features in Lucene 2.9. We need to store two
> numbers (say
> number of documents containing the field and average length) to support
> incremental indexing.
> b. Easy integration of BM25F similarity - default parameter values, working
> with regular Lucene class hierarchy.
> c. Support for all regular query types - PhraseQuery, FuzzyQuery etc. (We
> could do this incrementally,
> throwing an "UnsupportedOperationException" in the meantime).
> d. Some work on run-time efficiency, to be near the efficiency of the
> default scoring.
> I could do some of this work myself, but guidance from a Lucene scoring
> guru would be a great help.
> Thanks,
> Yuval
>

RE: BM25 Scoring Patch

Reply via email to