Re: AW: Boolean Query

Doug Cutting Thu, 12 Jan 2006 09:53:58 -0800

Klaus wrote:

I have tried to study to lucene scoring in the default similarity. Can
anyone explain me, how this similarity was designed? I have read a lot of IR
literature, but I have never seen an equation like the one used in lucene.
Why is this better then the normal cosine-measure?

It degenerates to the normal cosine measure. So it's the cosine measurewith a few bells and whistles.

The tf(), idf(), lengthNorm() and queryNorm() are directly from thecosine measure, although lengthNorm()'s default implemenation uses anapproximation.

So the non-standard bits are getBoost(), which permits incorporation ofa-priori document weights, like PageRank, and coord(), which makes OR'smore AND-like. Cosine is OR-based, but, for short queries over largecollections, AND tends to give better results than OR.


Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: AW: Boolean Query

Reply via email to