I applied the Lucene patch mentioned in 
https://issues.apache.org/jira/browse/LUCENE-2091 and ran the MAP numbers on 
TREC-3 collection using topics 151-200.  I am not getting worse results 
comparing to Lucene DefaultSimilarity.  I suspect, I am not using it correctly. 
 I have single field documents.  This is the process I use:

1. During the indexing, I am setting the similarity to BM25 as such:

IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(
                Version.LUCENE_CURRENT), true,
                IndexWriter.MaxFieldLength.UNLIMITED);
writer.setSimilarity(new BM25Similarity());

2. During the Precision/Recall measurements, I am using a SimpleBM25QQParser 
extension I added to the benchmark:

QualityQueryParser qqParser = new SimpleBM25QQParser("title", "TEXT");


3. Here is the parser code (I set an avg doc length here):

public Query parse(QualityQuery qq) throws ParseException {
    BM25Parameters.setAverageLength(indexField, 798.30f);//avg doc length
    BM25Parameters.setB(0.5f);//tried default values
    BM25Parameters.setK1(2f);
    return query = new BM25BooleanQuery(qq.getValue(qqName), indexField, new 
StandardAnalyzer(Version.LUCENE_CURRENT));
}

4. The searcher is using BM25 similarity:

Searcher searcher = new IndexSearcher(dir, true);
searcher.setSimilarity(sim);

Am I missing some steps?  Does anyone have experience with this code?

Thanks,

Ivan


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to