Preventing short documents from being boosted

Wright, Tim Fri, 08 Sep 2006 02:57:45 -0700

Hi all, 

We have an issue where around 10-20% of our documents are much shorter
(only a paragraph or so of text) than all the rest. Because Lucene
considers document length when indexing, most of the time these shorter
documents end up being scored higher than the longer ones.


We'd prefer it if we could remove the length factor, or at least reduce
the weight of it so that we returned a mixture of long and short
documents. Is there a simple way of doing this? I've considered applying
a document boost based on length, but I'm not quite sure of the equation
I'd have to use to "counter" the innate prioritisation of short
documents.

Cheers,

Tim.

--------------------------------------------------------------------------------------------------------------------------------------------
The information contained in this email message may be confidential. If you are 
not the intended recipient, any use, interference with, disclosure or copying 
of this material is unauthorised and prohibited. Although this message and any 
attachments are believed to be free of viruses, no responsibility is accepted 
by Informa for any loss or damage arising in any way from receipt or use 
thereof.  Messages to and from the company are monitored for operational 
reasons and in accordance with lawful business practices. 
If you have received this message in error, please notify us by return and 
delete the message and any attachments.  Further enquiries/returns can be sent 
to [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Preventing short documents from being boosted

Reply via email to