Klaus wrote:
I have tried to study to lucene scoring in the default similarity. Can
anyone explain me, how this similarity was designed? I have read a lot of IR
literature, but I have never seen an equation like the one used in lucene.
Why is this better then the normal cosine-measure?

It degenerates to the normal cosine measure. So it's the cosine measure with a few bells and whistles.

The tf(), idf(), lengthNorm() and queryNorm() are directly from the cosine measure, although lengthNorm()'s default implemenation uses an approximation.

So the non-standard bits are getBoost(), which permits incorporation of a-priori document weights, like PageRank, and coord(), which makes OR's more AND-like. Cosine is OR-based, but, for short queries over large collections, AND tends to give better results than OR.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to