Hello, forgive my ignorance here (I have not worked with these english TREC collections), but is the TREC-3 test collection the same as the test collection used in the 2007 paper you referenced?
It looks like that is a different collection, its not really possible to compare these relevance scores across different collections. On Wed, Jan 27, 2010 at 11:06 AM, Grant Ingersoll <gsing...@apache.org>wrote: > > On Jan 26, 2010, at 8:28 AM, Ivan Provalov wrote: > > > We are looking into making some improvements to relevance ranking of our > search platform based on Lucene. We started by running the Ad Hoc TREC task > on the TREC-3 data using "out-of-the-box" Lucene. The reason to run this > old TREC-3 (TIPSTER Disk 1 and Disk 2; topics 151-200) data was that the > content is matching the content of our production system. > > > > We are currently getting average precision of 0.14. We found some format > issues with the TREC-3 data which were causing even lower score. For > example, the initial average precision number was 0.9. We discovered that > the topics included the word "Topic:" in the <title> tag. For example, > > "<title> Topic: Coping with overcrowded prisons". By removing this term > from the queries, we bumped the average precision to 0.14. > > There's usually a lot of this involved in running TREC. I've also seen a > good deal of improvement from things like using phrase queries and the > Dismax Query Parser in Solr (which uses DisjunctionQuery in Lucene, amongst > other things) and by playing around with length normalization. > > > > > > Our query is based on the title tag of the topic and the index field is > based on the <TEXT> tag of the document. > > > > QualityQueryParser qqParser = new SimpleQQParser("title", "TEXT"); > > > > Is there an average precision number which "out-of-the-box" Lucene should > be close to? For example, this IBM's 2007 TREC paper mentions 0.154: > > http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf > > Hard to say. I can't say I've run TREC 3. You might ask over on the Open > Relevance list too (http://lucene.apache.org/openrelevance). I know > Robert Muir's done a lot of experiments with Lucene on standard collections > like TREC. > > I guess the bigger question back to you is what is your goal? Is it to get > better at TREC or to actually tune your system? > > -Grant > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com