In the Javadoc page for the Similarity class, it says,

"Lucene combines Boolean model (BM) of Information Retrieval with Vector Space 
Model (VSM) of Information Retrieval - documents "approved" by BM are scored by 
VSM."

Is the Vector Space Model that is referred to here different than the term 
vectors that can optionally be stored in index fields? It sounds like the 
vector space model is used by Lucene in all cases in order to determine ranking 
of returned results, not only when indexing with term vectors is enabled. If 
you have indexed without term vectors, what does Lucene use to score "approved" 
documents? And if you have indexed with term vectors, what does that enable you 
to do that you couldn't do with an index without term vectors?

Is there a kind of search in Lucene in which documents are "approved" by VSM as 
well as scored by them, or does that even make sense? I understand how 
similarity works when comparing two documents, but I can't imagine that it 
would work to search by comparing a term vector from a set of search terms 
against each of the term vectors in an index one at a time. Is there a more 
efficient way of searching using a term vector of search terms - other than 
using its terms in a Boolean search that is?

I am asking because my boss asked me what all of the ways that Lucene uses 
vectors in indexing and search were, and my answer revealed a lot of gaps in my 
understanding of it.
Thanks,
Mike

Reply via email to