RE: Newbie questions re: scoring

Chris Hostetter Thu, 04 May 2006 17:52:37 -0700

: That link appears to be referring to normalized scores (everything is <
: 1.0).  Is it also not safe to use a threshold for raw scores?


Nope.  The basic flaw in comparing scores between two queries still holds
... early messages in the threads linked to go into more detail, but as i
recall, the basic problem has to do with the way idf and docFreq come into
play.  Just becuase a term query for foo:bar says that document A has a
score of 2.2 and B has a score of 6.6; and a term query for yak:baz says
that document X has a score of 2.2 and Y has a score of 6.6 doesn't means
X is as relevent to yak:baz as A is to foo:bar -- it just means that the
relative quality of B compared to A is the same as the relative quality of
Y compared to X for their respective queries.  (once their normalized,
even that goes out the window)

the only way I can think of to fairly compare scores from queries for
foo:bar with queries for yak:baz is to normalize them relative a maximum
possible score across the entire term query space -- but finding that
maximum is a pretty complicated problem just for simple term queries ...
when you start talking about more complicated query structures you really
get messy -- and even then it's only fair as long as the query structures
are identical, you can never compare the scores from apples and oranges.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Newbie questions re: scoring

Reply via email to