(NOTE: replying back to java-user, for the reasons listed at http://people.apache.org/~hossman/#private_q )
: Date: Fri, 24 Mar 2006 08:42:29 -0000 : Subject: Re: Changing ranking : : HI Chris, : Thanks, so would that make it as simple as a document with 5 matching : occurences ranks higher than a document with 4 occurences? Score calculations tend to be complicated, but if what you really care about is just the number of occurences, then omiting norms is a one way to start. : This should achieve my objective of showing slightly longer documents first : (reallly it doesnt actually have to be the longest, I just want to stop : documents with onle two words ranking first) it won't acctually make longer docs appear first -- it will just help ensure that there is no penalty for a doc being longer. 5 word occurances in a 10 word document would probably score the same as those 5 words in a 20 word document, the order that they come back might be determined by the order they were added to the index at that point. term frequency also comes into play -- if your BooleanQuery contains 10 optional terms, and the 4 that apear the least frequently in your index appear in one document, and the other 6 apear in a differnet document -- the doc with the 4 rare ones might wind up scoring higher. To really understand scoring you should do some experiments, and look at the Explanation information for your queres to understand how things like tf and idf impact your score. Then you can think about how you might want to change your Similarity class to meet your needs. : > : > : Is there anyway I can change luicene to rank longer documents with more : > : phrase occurences higher : > : > if what you care about is only the number of occurences, and you don't : > want the length to be a factor at all, then using Field.setOmitNorms(true) : > on the Field for every document you add will not only accomplish this, but : > will also save one byte per field per document in your index. : > : > that can add up if you have a lot of fields whose length you don't care : > about. : > : > : > -Hoss : > : > : > --------------------------------------------------------------------- : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : > : > : > : > : > : > -- : > No virus found in this incoming message. : > Checked by AVG Free Edition. : > Version: 7.1.385 / Virus Database: 268.3.0/290 - Release Date: 23/03/2006 : > : > : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]