We are looking into making some improvements to relevance ranking of our search 
platform based on Lucene.  We started by running the Ad Hoc TREC task on the 
TREC-3 data using "out-of-the-box" Lucene.  The reason to run this old TREC-3 
(TIPSTER Disk 1 and Disk 2; topics 151-200) data was that the content is 
matching the content of our production system.  

We are currently getting average precision of 0.14.  We found some format 
issues with the TREC-3 data which were causing even lower score.  For example, 
the initial average precision number was 0.9.  We discovered that the topics 
included the word "Topic:" in the <title> tag.  For example, 
"<title> Topic:  Coping with overcrowded prisons".  By removing this term from 
the queries, we bumped the average precision to 0.14.

Our query is based on the title tag of the topic and the index field is based 
on the <TEXT> tag of the document.  

QualityQueryParser qqParser = new SimpleQQParser("title", "TEXT");

Is there an average precision number which "out-of-the-box" Lucene should be 
close to?  For example, this IBM's 2007 TREC paper mentions 0.154:  
http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf

Thank you,

Ivan


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to