Is it possible to compute a theoretical maximum score for a given query if constraints are placed on 'tf' and 'lengthNorm'? If so, scores could be compared to a 'perfect score' (a feature request from our customers)
Here are some related threads on this: In this thread: http://www.nabble.com/Newbie-questions-re%3A-scoring-td4228776.html#a4228776 Hoss writes: > the only way I can think of to fairly compare scores from queries for > foo:bar with queries for yak:baz is to normalize them relative a maximum > possible score across the entire term query space -- but finding that > maximum is a pretty complicated problem just for simple term queries ... > when you start talking about more complicated query structures you really > get messy -- and even then it's only fair as long as the query structures > are identical, you can never compare the scores from apples and oranges And in this thread: http://www.nabble.com/non-relative-scoring-td8956299.html#a8956299 Walt writes: > A tf.idf engine, like Lucene, might not have a maximum score. > What if a document contains the word a thousand times? > A million times? It seems that if 'tf' is limited to a max value and 'lengthNorm' is a constant, it might be possible, at least for 'simple' term queries. But Hoss says that things get messing with complicated queries. Could someone elaborate a bit? Does the index contain enough info to do this efficiently? I realize that scores values must be interpreted 'carefully', but I'm seeing a push to get more leverage from the absolute values, not just the relative values. Peter