In our application we have multiple fields that are searched. So fast car becomes:
+(field1:fast field2:fast field3:fast) +(field1:car field2:car field3:car) I understand that the default sqrt implementation of tf() would help the "lopsided score" phenomenon with searches within the same field. But when searching in multiple fields, this effect is obscured since each matching field adds to the score of that clause. Is there a way to "peek" at the scores of each clause, and adjust based on how divergent the scores are? Or is there an easier way to do this that I'm just not seeing? Andy On 9/18/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
On Monday 18 September 2006 23:08, Andy Liu wrote: > For multi-word queries, I would like to reward documents that contain a more > even distribution of each word and penalize documents that have a skewed > distribution. For example, if my search query is: > > +content:fast +content:car > > I would prefer a document that contains each word an equal number of times > over a document that contains the word "fast" 100 times and the word "car" 1 > time. In other words, I would like to compare the scores of each > BooleanQuery term and adjust the score according to the distribution. > > Can somebody point me in the right direction as to how I would implement > this? It's already there in DefaultSimilarity.tf() which is the square root: (sqrt(1) + sqrt(1)) > (sqrt(0) + sqrt(2)) Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]