Sorry to re-post -- is this the correct forum for questions like this? I think that writing a new encode/decode operation should help alleviate my problem, but thought that this must be fairly widespread issue for people using lucene for "non-web-page" searches (i.e., shorter documents)
Thanks again, John On 4/2/07, John Kleven <[EMAIL PROTECTED]> wrote:
My documents are cars... i.e., Nissan Altima Sports Package Nissan Altima Standard The problem I have is when i search "Nissan Altima", I want to get the 2nd hit back first, i.e. "Nissan Altima Standard", because it is shorter. However, this doesn't happen. They are both scored the exact same. I know that the lengthNorm in Similarity is using 1/sqrt(numTerms), and you would think that would be enuff to make sure the order is correct. However, it is not, and I assume this is because of the encode/decode functions that pack this value into a single byte do not have the granularity to represent differences between numbers like 1/sqrt(3) vs 1/sqrt(4)?? Is the suggested approach here to re-write the encode/decode operations, or is there any easier way? Thanks kindly - John