Re: boosting relevance of certain documents

Grant Ingersoll Fri, 25 Apr 2008 15:42:38 -0700

It really depends. Hand tuning scoring algs for a specific query isvery prone to local maxima problems. In other words, you fix onequery and break 50 others. Sometimes, a good old "configurable" hardcode is the way to go. If you want a specific doc to be #1, make itnumber one. You will pull your hair out otherwise. In Solr, this ishandled via the Query Elevation Component, but isn't all thatdifficult to implement.

Likewise, if you have a priori knowledge that a particular document ismore important, then give it a relatively large boost during indexing,being aware that Lucene does not offer much granularity in terms ofboosts. In other words, boost it something like 5 or 10 times,instead of 1.1 vs. 1.2.

On the other hand, if you are truly seeing broad problems, then youneed to build up a set of queries and judgments (ala TREC) or thecontrib/benchmark Quality packages. You might also look at Lucene'sSimilarity class. Lucene's length normalization is often less thanoptimal for certain types of documents (see the IBM Haifa's assessmentfor the "Million Query" track of TREC on the Lucene Wiki).


Cheers,
Grant


On Apr 25, 2008, at 3:50 PM, Daniel Freudenberger wrote:

Thanks for your response. I already knew that the relevance is basedon the

term frequency but in some cases it's just not what the user expects.

As I already mentioned, "fifa 2003 fifa 03" vs. "fifa 08" is such acase -searching for "fifa" would return the "fifa 2003 fifa 03" documentfirst butthe "fifa 08" document is more important (from the user's point ofview).


Any suggestions?

Best regards,
Daniel
-----Original Message-----
From: Jonathan Ariel [mailto:[EMAIL PROTECTED]
Sent: Friday, April 25, 2008 8:11 PM
To: java-user@lucene.apache.org
Subject: Re: boosting relevance of certain documents

Ok. So I'm not an expert of the scoring algorithm, but based ontf*idf youcan tell that the returned document is more relevant because it hasmore

term frequency.

Using the explain you can see the following:

Doc 1
0.643841 = (MATCH) fieldWeight(searchable:fifa in 0), product of:
 1.0 = tf(termFreq(searchable:fifa)=1)
 1.287682 = idf(docFreq=2)
 0.5 = fieldNorm(field=searchable, doc=0)

Doc2
0.68289655 = (MATCH) fieldWeight(searchable:fifa in 1), product of:
 1.4142135 = tf(termFreq(searchable:fifa)=2)
 1.287682 = idf(docFreq=2)
 0.375 = fieldNorm(field=searchable, doc=1)

On Fri, Apr 25, 2008 at 2:30 PM, Daniel Freudenberger <
[EMAIL PROTECTED]> wrote:

I'm using the StandardAnalyzer - hope this answers your question (I'm
quite
new to the lucene thing)

-----Original Message-----
From: Jonathan Ariel [mailto:[EMAIL PROTECTED]
Sent: Friday, April 25, 2008 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: boosting relevance of certain documents

How are you analyzing the searchable field?

On Fri, Apr 25, 2008 at 12:49 PM, Daniel Freudenberger <
[EMAIL PROTECTED]> wrote:

Hello,



I'm using lucene within a new project and I'm not sure about how to

solve

the following problem: My index consists of the two attributes"id" and

"searchable". "id" is the id of a product and "searchable" is a
combination
of the product name and its category name.



example:

id     searchable

1     fifa 08 - playstation 3

2     fifa 2003 fifa 03 - playstation 3

3     playstation 60gb hdd - playstation 3

4     playstation i like you - playstation 3



When searching for "fifa", lucene returns the product with id 2 at

first,

whereas id 1 ("fifa 08") would be the much more relevant result(from

the

user side of view). the same problem arises when searching for
"playstation"
- the customer expects products having "playstation" in theirnames at
first, ideally the console itself. in reality however, he gets all
possible
products which are in the "playstation" category as well.
my idea was to introduce another attribute relevance, which mayincrease
the
relevance of an entry. the actual relevance shouldn't be suppressed
completely though, but should only be taken into account withproducts
that
are similarly relevant for a specific search term.



Does anybody have an idea on how to solve this problem?



Thank you in advance,

Daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: boosting relevance of certain documents

Reply via email to