It is the right forum, silence just means either no one knows the
answer or no one who knows the answer has read it... Such is the
nature of the community.
Have you looked at overriding similarity with your own
implementation? Have you done explain() calls on the docs to see
where the scores are coming from? You may be seeing other factors at
play.
You might also try searching the archives for length normalization.
I seem to recall someone talking about the opposite problem, calling
it "fair" similarity, so maybe you could use that as a basis for your
implementation (by doing the opposite).
-Grant
On Apr 5, 2007, at 1:45 PM, John Kleven wrote:
Sorry to re-post -- is this the correct forum for questions like
this? I
think that writing a new encode/decode operation should help
alleviate my
problem, but thought that this must be fairly widespread issue for
people
using lucene for "non-web-page" searches (i.e., shorter documents)
Thanks again,
John
On 4/2/07, John Kleven <[EMAIL PROTECTED]> wrote:
My documents are cars...
i.e.,
Nissan Altima Sports Package
Nissan Altima Standard
The problem I have is when i search "Nissan Altima", I want to get
the 2nd
hit back first, i.e. "Nissan Altima Standard", because it is shorter.
However, this doesn't happen. They are both scored the exact same.
I know that the lengthNorm in Similarity is using 1/sqrt
(numTerms), and
you would think that would be enuff to make sure the order is
correct.
However, it is not, and I assume this is because of the encode/decode
functions that pack this value into a single byte do not have the
granularity to represent differences between numbers like 1/sqrt
(3) vs
1/sqrt(4)??
Is the suggested approach here to re-write the encode/decode
operations,
or is there any easier way?
Thanks kindly -
John
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]