I applied the Lucene patch mentioned in https://issues.apache.org/jira/browse/LUCENE-2091 and ran the MAP numbers on TREC-3 collection using topics 151-200. I am not getting worse results comparing to Lucene DefaultSimilarity. I suspect, I am not using it correctly. I have single field documents. This is the process I use:
1. During the indexing, I am setting the similarity to BM25 as such: IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer( Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.UNLIMITED); writer.setSimilarity(new BM25Similarity()); 2. During the Precision/Recall measurements, I am using a SimpleBM25QQParser extension I added to the benchmark: QualityQueryParser qqParser = new SimpleBM25QQParser("title", "TEXT"); 3. Here is the parser code (I set an avg doc length here): public Query parse(QualityQuery qq) throws ParseException { BM25Parameters.setAverageLength(indexField, 798.30f);//avg doc length BM25Parameters.setB(0.5f);//tried default values BM25Parameters.setK1(2f); return query = new BM25BooleanQuery(qq.getValue(qqName), indexField, new StandardAnalyzer(Version.LUCENE_CURRENT)); } 4. The searcher is using BM25 similarity: Searcher searcher = new IndexSearcher(dir, true); searcher.setSimilarity(sim); Am I missing some steps? Does anyone have experience with this code? Thanks, Ivan --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org