Hello Steven, unfortunately I don't have access to these books right now. I will try to get hold of them. Thank you for these pointers. :)
I had a quick look at "coordination level matching" on the web and found evidence that this seemed to be an early retrieval strategy. My question is mainly, why one should use coordination level matching, if one is already doing (proper) TFxIDF based matching. When I look at Lucenes scoring forumla, it seems to me that two kinds of matching are performed and combined together in a single matching formula. In the paper, "Exploiting the Similarity of Non-matching Terms at Retrieval Time" which can be found here: http://www.cis.strath.ac.uk/~fabioc/papers/00-jir.pdf it is directly compared with TFxIDF. To me, it seems that coordination level matching could be used if I don't want to use TFxIDF but not together with it. In this context, I wonder what benefit the "coordination level matching" has in combination with TFxIDF? It is likely that I have some kind of misunderstanding here. Perhaps with your help I can untangle that a bit further. As I said earlier, I am only looking for a reasonable explaination (perhaps augmented with some evidence in literature) that makes it clear why it is used together with TFxIDF. Thank you, Karl -------- Original-Nachricht -------- Datum: Tue, 12 Dec 2006 17:15:48 -0500 Von: Steven Rowe <[EMAIL PROTECTED]> An: java-user@lucene.apache.org Betreff: Re: Lucene scoring: coord_q_d factor > Karl Koch wrote: > > Is there any other paper that actually shows the benefit of doing > > this particular normalisation with coord_q_d? I am not suggesting > > here that it is not useful, I am just looking for evidence how the > > idea developed. > > I think it's a mischaracterization to call coordination a > "normalization". In my mind, "normalization" is something applied > equally to all documents' scores. The coordination component of a > document's score varies from document to document, and so doesn't meet > this criterion. > > I repeat the citation of the book cited by the paper I cited :) : > > >> Salton, G. & McGill, M. Introduction to Modern Information > >> Retrieval. McGraw-Hill, New York, 1983. > > In addition to the above book, here are two other books that I've seen > cited as describing "coordination-level matching" (a.k.a. "overlap > ranking"): > > Salton, G. (1968). Automatic information organization and retrieval. > New York: McGraw-Hill. > > Lancaster, F.W. (1979). Information retrieval systems: Characteristics, > testing and evaluation (2nd ed.). New York: Wiley. > > I don't know the answer to your larger question: why use a coordination > component in a similarity measure when other components (tf*idf) seem to > serve the same function? What you seem to be looking for is a study > that directly compares a system using a coordination component in its > similarity measure with the *same* system, varying the measure only in > that coordination is elided. Unfortunately, I know of no such study. > > Good luck, > Steve > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]