Basic question on lucene query processing

Kelly Vista Mon, 13 Mar 2006 11:59:26 -0800

Hi -

I have a basic question on the way queries are processed in Lucene. Iunderstand that Lucene uses a variation of the vector space model in termsof how it detemines document similarity. In particular, I think it computessome sort of normalized TF-IDF score for some query against the collectionof documents.

However, my question is this. In order for it to compute the TF-IDF scorewith respect to a particular document, it would seem that Lucene would needto iterate over all possible documents. For example, given a query q and adocument d, compute score(q, d). In order to identify the highest score, itwould seem that it would need to look at *all* documents (or else, how doesit know how a query evaluates against each a document?). This seems veryinefficient, but I'm sure it's not the case -- as I have heard that Luceneis generally pretty efficient.

If someone can please help me understand whether or not this is the case, Iwould appreciate it.

Just a note: strikes me that an alternative way to do things is to firstidentify a set of documents that have the term in them first (i.e., a grep)before doing the iteration. In fact, this first step is often more complexin other systems where computing score() is more expensive.


Thanks,

_________________________________________________________________

Dont just search. Find. Check out the new MSN Search!http://search.msn.click-url.com/go/onm00200636ave/direct/01/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Basic question on lucene query processing

Reply via email to