On Jan 26, 2010, at 8:28 AM, Ivan Provalov wrote: > We are looking into making some improvements to relevance ranking of our > search platform based on Lucene. We started by running the Ad Hoc TREC task > on the TREC-3 data using "out-of-the-box" Lucene. The reason to run this old > TREC-3 (TIPSTER Disk 1 and Disk 2; topics 151-200) data was that the content > is matching the content of our production system. > > We are currently getting average precision of 0.14. We found some format > issues with the TREC-3 data which were causing even lower score. For > example, the initial average precision number was 0.9. We discovered that > the topics included the word "Topic:" in the <title> tag. For example, > "<title> Topic: Coping with overcrowded prisons". By removing this term > from the queries, we bumped the average precision to 0.14.
There's usually a lot of this involved in running TREC. I've also seen a good deal of improvement from things like using phrase queries and the Dismax Query Parser in Solr (which uses DisjunctionQuery in Lucene, amongst other things) and by playing around with length normalization. > > Our query is based on the title tag of the topic and the index field is based > on the <TEXT> tag of the document. > > QualityQueryParser qqParser = new SimpleQQParser("title", "TEXT"); > > Is there an average precision number which "out-of-the-box" Lucene should be > close to? For example, this IBM's 2007 TREC paper mentions 0.154: > http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf Hard to say. I can't say I've run TREC 3. You might ask over on the Open Relevance list too (http://lucene.apache.org/openrelevance). I know Robert Muir's done a lot of experiments with Lucene on standard collections like TREC. I guess the bigger question back to you is what is your goal? Is it to get better at TREC or to actually tune your system? -Grant -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org