http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967
-Grant
On Sep 8, 2006, at 5:57 AM, Wright, Tim wrote:
Hi all,
We have an issue where around 10-20% of our documents are much shorter
(only a paragraph or so of text) than all the rest. Because Lucene
considers document length when indexing, most of the time these
shorter
documents end up being scored higher than the longer ones.
We'd prefer it if we could remove the length factor, or at least
reduce
the weight of it so that we returned a mixture of long and short
documents. Is there a simple way of doing this? I've considered
applying
a document boost based on length, but I'm not quite sure of the
equation
I'd have to use to "counter" the innate prioritisation of short
documents.
Cheers,
Tim.
----------------------------------------------------------------------
----------------------------------------------------------------------
The information contained in this email message may be
confidential. If you are not the intended recipient, any use,
interference with, disclosure or copying of this material is
unauthorised and prohibited. Although this message and any
attachments are believed to be free of viruses, no responsibility
is accepted by Informa for any loss or damage arising in any way
from receipt or use thereof. Messages to and from the company are
monitored for operational reasons and in accordance with lawful
business practices.
If you have received this message in error, please notify us by
return and delete the message and any attachments. Further
enquiries/returns can be sent to [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]