Hi all, We have an issue where around 10-20% of our documents are much shorter (only a paragraph or so of text) than all the rest. Because Lucene considers document length when indexing, most of the time these shorter documents end up being scored higher than the longer ones.
We'd prefer it if we could remove the length factor, or at least reduce the weight of it so that we returned a mixture of long and short documents. Is there a simple way of doing this? I've considered applying a document boost based on length, but I'm not quite sure of the equation I'd have to use to "counter" the innate prioritisation of short documents. Cheers, Tim. -------------------------------------------------------------------------------------------------------------------------------------------- The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]