Grant,

We are currently working on a relevancy improvement project.  We took the IBM's 
paper from 2007 TREC and followed the approaches they described to improve 
Lucene's relevance.  It also gave us some idea of Lucene’s out-of-the-box 
precision performance (MAP).  In addition to it we used some of the best 
practices described in TREC's book (Voorhees 2005, MIT).  We also looked into 
the probability scoring model (BM25). 

We started by comparing “vanilla” Lucene to our Lucene-based product’s 
performance.  We obtained the collections and the judgments from the past TREC 
which were close to the genre of the content we store.  We then proceeded to 
study how different tunings affected the scores.  We used Lucene's benchmarking 
module to run against the TREC data.  Even though there were a few old TREC 
document/topic format related issues along the way, this benchmarking tool was 
all together great in helping find the MAP and measure where we were at.  

Then we applied the Sweet Spot similarity, Pivot Point document length 
normalization (Lnb/Ltc), and BM25 scoring algorithms.  After applying these 
different scoring mechanism changes and other techniques (different stemmers, 
query expansion), we saw some improvements.  We then compared this to our 
current production system and started tuning it as well.  

Our second goal here was to include the relevance measurement into the 
continuous integration tests running nightly.  The thought here is that if one 
of the system’s changes inadvertently affected the scoring, we would find out 
right away.  This second phase also helped us discover hidden bugs in our 
production system. 

In addition to the English-based analyzers, we also studied Chinese analyzers 
and compared the results with the English collection runs.  We used TREC data 
for that.

Some observations:
1.      Even though the Vector Space model with Boolean query (OR) gives good 
MAP scores, in some products the large number of returned results makes the 
product less usable.  So, defaulting to AND operator may be a better option as 
was mentioned in this user group post earlier.
2.      This TREC-based evaluation is just of many tools to use.  For example, 
user feed-back is still the most important evaluation one can do.
3.      We will continue studying how different scoring mechanisms affect 
relevance quality before making a decision whether to switch from the default 
VSM.  Some of our concerns are over-tuning and performance testing.
4.      Lucene user community has been very helpful.  Robert Muir, Joaquin 
Iglesias, and others helped with applying the scoring algorithms and providing 
great suggestions. 
5.      Some of the tools we use constantly - Lucene’s query Explanation and 
Luke.

Thanks,

Ivan Provalov




--- On Thu, 4/29/10, Grant Ingersoll <gsing...@apache.org> wrote:

> From: Grant Ingersoll <gsing...@apache.org>
> Subject: Relevancy Practices
> To: java-user@lucene.apache.org
> Date: Thursday, April 29, 2010, 10:14 AM
> I'm putting on a talk at Lucene
> Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1)
> on "Practical Relevance" and I'm curious as to what people
> put in practice for testing and improving relevance.  I
> have my own inclinations, but I don't want to muddy the
> water just yet.  So, if you have a few moments, I'd
> love to hear responses to the following questions.
> 
> What worked?  
> What didn't work?  
> What didn't you understand about it?  
> What tools did you use?  
> What tools did you wish you had either for debugging
> relevance or "fixing" it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide
> to do relevance tuning?  Was that timing planned or
> not?
> 
> 
> Thanks,
> Grant
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to