Re: Lucene scoring: coord_q_d factor

Karl Koch Wed, 13 Dec 2006 07:01:11 -0800

Hello Steven,

unfortunately I don't have access to these books right now. I will try to get 
hold of them. Thank you for these pointers. :)

I had a quick look at "coordination level matching" on the web and found 
evidence that this seemed to be an early retrieval strategy. My question is 
mainly, why one should use coordination level matching, if one is already doing 
(proper) TFxIDF based matching. When I look at Lucenes scoring forumla, it 
seems to me that two kinds of matching are performed and combined together in a 
single matching formula. 

In the paper, "Exploiting the Similarity of Non-matching Terms at Retrieval 
Time" which can be found here:

http://www.cis.strath.ac.uk/~fabioc/papers/00-jir.pdf

it is directly compared with TFxIDF. To me, it seems that coordination level 
matching could be used if I don't want to use TFxIDF but not together with it. 
In this context, I wonder what benefit the "coordination level matching" has in 
combination with TFxIDF?

It is likely that I have some kind of misunderstanding here. Perhaps with your 
help I can untangle that a bit further. As I said earlier, I am only looking 
for a reasonable explaination (perhaps augmented with some evidence in 
literature) that makes it clear why it is used together with TFxIDF.

Thank you,
Karl

-------- Original-Nachricht --------
Datum: Tue, 12 Dec 2006 17:15:48 -0500
Von: Steven Rowe <[EMAIL PROTECTED]>
An: java-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor

> Karl Koch wrote:
> > Is there any other paper that actually shows the benefit of doing 
> > this particular normalisation with coord_q_d? I am not suggesting
> > here that it is not useful, I am just looking for evidence how the
> > idea developed.
> 
> I think it's a mischaracterization to call coordination a
> "normalization".  In my mind, "normalization" is something applied
> equally to all documents' scores.  The coordination component of a
> document's score varies from document to document, and so doesn't meet
> this criterion.
> 
> I repeat the citation of the book cited by the paper I cited :) :
> 
> >> Salton, G. & McGill, M. Introduction to Modern Information
> >> Retrieval. McGraw-Hill, New York, 1983.
> 
> In addition to the above book, here are two other books that I've seen
> cited as describing "coordination-level matching" (a.k.a. "overlap
> ranking"):
> 
> Salton, G. (1968). Automatic information organization and retrieval.
> New York: McGraw-Hill.
> 
> Lancaster, F.W. (1979). Information retrieval systems: Characteristics,
> testing and evaluation (2nd ed.). New York: Wiley.
> 
> I don't know the answer to your larger question: why use a coordination
> component in a similarity measure when other components (tf*idf) seem to
> serve the same function?  What you seem to be looking for is a study
> that directly compares a system using a coordination component in its
> similarity measure with the *same* system, varying the measure only in
> that coordination is elided.  Unfortunately, I know of no such study.
> 
> Good luck,
> Steve
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene scoring: coord_q_d factor

Reply via email to