Re: short documents = help me tweak Similarity??

John Kleven Thu, 05 Apr 2007 10:45:56 -0700

Sorry to re-post -- is this the correct forum for questions like this?  I
think that writing a new encode/decode operation should help alleviate my
problem, but thought that this must be fairly widespread issue for people
using lucene for "non-web-page" searches (i.e., shorter documents)


Thanks again,
John

On 4/2/07, John Kleven <[EMAIL PROTECTED]> wrote:


My documents are cars...
i.e.,
Nissan Altima Sports Package
Nissan Altima Standard

The problem I have is when i search "Nissan Altima", I want to get the 2nd
hit back first, i.e. "Nissan Altima Standard", because it is shorter.
However, this doesn't happen.  They are both scored the exact same.

I know that the lengthNorm in Similarity is using 1/sqrt(numTerms), and
you would think that would be enuff to make sure the order is correct.
However, it is not, and I assume this is because of the encode/decode
functions that pack this value into a single byte do not have the
granularity to represent differences between numbers like 1/sqrt(3) vs
1/sqrt(4)??

Is the suggested approach here to re-write the encode/decode operations,
or is there any easier way?

Thanks kindly -
John

Re: short documents = help me tweak Similarity??

Reply via email to