Ok, I know that usually, the scores returned by Lucene do not mean "really"
something. But in my case, it does, I play with the similarity and bla bla
bla... Now my concern is that the Query.setBoost() does not always seems to
affect the score. I've built a simple test (code completely at the end) and
I have the following output. I'm not using the Hits object but rather the
TopDocs, so that I can have access to the raw un-normalized score. Query1
and query2 gets exactly the same score, while I was expecting that query2
would have half the score of query one. Query3 seems to have been affected
by the query boost.
Is it a "normal behaviour" ? how can I know if the boost was applied or
not? Even more, how can I "force" the score to be applied? In this case, I
use the QueryParser in others, I create my own TermQuery and set a boost on
it. I have the same "problem" with my TermQueries, the boost just don't get
applied.
Any clues? This is a major showstopper for me...
Thanks,
Jp
== OUTPUT ==
QUERY_1=labeltxt:post labeltxt:office
QUERY_2=(labeltxt:post labeltxt:office)^0.5
QUERY_3=labeltxt:post^0.5 labeltxt:office
score_1: 5.139783
score_2: 5.139783
score_3: 4.8512564
explanation for qlbl_1:main post office
5.139783 = sum of:
3.7358308 = weight(labeltxt:post in 28114), product of:
0.85255265 = queryWeight(labeltxt:post), product of:
8.763871 = idf(docFreq=16)
0.097280376 = queryNorm
4.3819356 = fieldWeight(labeltxt:post in 28114), product of:
1.0 = tf(termFreq(labeltxt:post)=1)
8.763871 = idf(docFreq=16)
0.5 = fieldNorm(field=labeltxt, doc=28114)
1.4039522 = weight(labeltxt:office in 28114), product of:
0.52264136 = queryWeight(labeltxt:office), product of:
5.372526 = idf(docFreq=504)
0.097280376 = queryNorm
2.686263 = fieldWeight(labeltxt:office in 28114), product of:
1.0 = tf(termFreq(labeltxt:office)=1)
5.372526 = idf(docFreq=504)
0.5 = fieldNorm(field=labeltxt, doc=28114)
explanation for qlbl_2: main post office
5.139783 = sum of:
3.7358308 = weight(labeltxt:post in 28114), product of:
0.85255265 = queryWeight(labeltxt:post), product of:
8.763871 = idf(docFreq=16)
0.097280376 = queryNorm
4.3819356 = fieldWeight(labeltxt:post in 28114), product of:
1.0 = tf(termFreq(labeltxt:post)=1)
8.763871 = idf(docFreq=16)
0.5 = fieldNorm(field=labeltxt, doc=28114)
1.4039522 = weight(labeltxt:office in 28114), product of:
0.52264136 = queryWeight(labeltxt:office), product of:
5.372526 = idf(docFreq=504)
0.097280376 = queryNorm
2.686263 = fieldWeight(labeltxt:office in 28114), product of:
1.0 = tf(termFreq(labeltxt:office)=1)
5.372526 = idf(docFreq=504)
0.5 = fieldNorm(field=labeltxt, doc=28114)
explanation for qlbl_3: main post office
4.8512564 = sum of:
2.7695916 = weight(labeltxt:post^0.5 in 28114), product of:
0.63204753 = queryWeight(labeltxt:post^0.5), product of:
0.5 = boost
8.763871 = idf(docFreq=16)
0.14423935 = queryNorm
4.3819356 = fieldWeight(labeltxt:post in 28114), product of:
1.0 = tf(termFreq(labeltxt:post)=1)
8.763871 = idf(docFreq=16)
0.5 = fieldNorm(field=labeltxt, doc=28114)
2.081665 = weight(labeltxt:office in 28114), product of:
0.7749297 = queryWeight(labeltxt:office), product of:
5.372526 = idf(docFreq=504)
0.14423935 = queryNorm
2.686263 = fieldWeight(labeltxt:office in 28114), product of:
1.0 = tf(termFreq(labeltxt:office)=1)
5.372526 = idf(docFreq=504)
0.5 = fieldNorm(field=labeltxt, doc=28114)
== Java Code ==
package testing;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.DefaultSimilarity;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
public class TestBoostQueries {
public static void main(String[] args) {
int maxSearchResults = 1;
WhitespaceAnalyzer analyzer = new WhitespaceAnalyzer();
try {
IndexSearcher labelSearcher = new
IndexSearcher("/tmp/Approach2/indices/memphis_tn_labels");
labelSearcher.setSimilarity(new DefaultSimilarity());
Document dd_1,dd_2,dd_3;
float score_1,score_2,score_3;
float fact = 0.5f;
Query qlbl_1 = QueryParser.parse("post office","labeltxt",analyzer);
Query qlbl_2 = QueryParser.parse("post office","labeltxt",analyzer);
qlbl_2.setBoost(fact);
Query qlbl_3 = QueryParser.parse("post^" +fact + "
office","labeltxt",analyzer);
System.out.println("QUERY_1=" + qlbl_1.toString());
System.out.println("QUERY_2=" + qlbl_2.toString());
System.out.println("QUERY_3=" + qlbl_3.toString());
TopDocs docs_1 = labelSearcher.search(qlbl_1,null,maxSearchResults);
TopDocs docs_2 = labelSearcher.search(qlbl_2,null,maxSearchResults);
TopDocs docs_3 = labelSearcher.search(qlbl_3,null,maxSearchResults);
for(int j=0; j < docs_1.scoreDocs.length; j++) {
dd_1 = labelSearcher.doc(docs_1.scoreDocs[j].doc);
dd_2 = labelSearcher.doc(docs_2.scoreDocs[j].doc);
dd_3 = labelSearcher.doc(docs_3.scoreDocs[j].doc);
System.out.println();
score_1 = docs_1.scoreDocs[j].score;
score_2 = docs_2.scoreDocs[j].score;
score_3 = docs_3.scoreDocs[j].score;
System.out.println("score_1: " +score_1);
System.out.println("score_2: " +score_2);
System.out.println("score_3: " +score_3);
System.out.println();
Explanation ex_1 =
labelSearcher.explain(qlbl_1,docs_1.scoreDocs[j].doc);
Explanation ex_2 =
labelSearcher.explain(qlbl_2,docs_2.scoreDocs[j].doc);
Explanation ex_3 =
labelSearcher.explain(qlbl_2,docs_3.scoreDocs[j].doc);
System.out.println("explanation for qlbl_1:" +
dd_1.get("labeltxt"));
System.out.println(ex_1.toString());
System.out.println("explanation for qlbl_2: " +
dd_2.get("labeltxt"));
System.out.println(ex_2.toString());
System.out.println("explanation for qlbl_3: " +
dd_3.get("labeltxt"));
System.out.println(ex_3.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]