On Monday 18 September 2006 23:08, Andy Liu wrote:
> For multi-word queries, I would like to reward documents that contain a more
> even distribution of each word and penalize documents that have a skewed
> distribution.  For example, if my search query is:
> 
> +content:fast +content:car
> 
> I would prefer a document that contains each word an equal number of times
> over a document that contains the word "fast" 100 times and the word "car" 1
> time.  In other words, I would like to compare the scores of each
> BooleanQuery term and adjust the score according to the distribution.
> 
> Can somebody point me in the right direction as to how I would implement
> this?

It's already there in DefaultSimilarity.tf() which is the square root:

(sqrt(1) + sqrt(1)) > (sqrt(0) + sqrt(2))


Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to