Re: Preventing short documents from being boosted

2006-09-08 Thread Daniel Naber
On Freitag 08 September 2006 13:30, Grant Ingersoll wrote: > http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967 I'd be happy about feedback about that similarity class, i.e. whether someone has used it successfully. If so, we could add it to the Lucene core (the old similarity w

Re: Preventing short documents from being boosted

2006-09-08 Thread Grant Ingersoll
http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967 -Grant On Sep 8, 2006, at 5:57 AM, Wright, Tim wrote: Hi all, We have an issue where around 10-20% of our documents are much shorter (only a paragraph or so of text) than all the rest. Because Lucene considers document length

Preventing short documents from being boosted

2006-09-08 Thread Wright, Tim
Hi all, We have an issue where around 10-20% of our documents are much shorter (only a paragraph or so of text) than all the rest. Because Lucene considers document length when indexing, most of the time these shorter documents end up being scored higher than the longer ones. We'd prefer it if