Re: Lopsided scores for each term in BooleanQuery

Andy Liu Mon, 18 Sep 2006 18:47:15 -0700

In our application we have multiple fields that are searched.  So fast car
becomes:


+(field1:fast field2:fast field3:fast) +(field1:car field2:car field3:car)

I understand that the default sqrt implementation of tf() would help the
"lopsided score" phenomenon with searches within the same field.  But when
searching in multiple fields, this effect is obscured since each matching
field adds to the score of that clause.  Is there a way to "peek" at the
scores of each clause, and adjust based on how divergent the scores are?  Or
is there an easier way to do this that I'm just not seeing?

Andy

On 9/18/06, Paul Elschot <[EMAIL PROTECTED]> wrote:


On Monday 18 September 2006 23:08, Andy Liu wrote:
> For multi-word queries, I would like to reward documents that contain a
more
> even distribution of each word and penalize documents that have a skewed
> distribution.  For example, if my search query is:
>
> +content:fast +content:car
>
> I would prefer a document that contains each word an equal number of
times
> over a document that contains the word "fast" 100 times and the word
"car" 1
> time.  In other words, I would like to compare the scores of each
> BooleanQuery term and adjust the score according to the distribution.
>
> Can somebody point me in the right direction as to how I would implement
> this?

It's already there in DefaultSimilarity.tf() which is the square root:

(sqrt(1) + sqrt(1)) > (sqrt(0) + sqrt(2))


Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lopsided scores for each term in BooleanQuery

Reply via email to