: I guess you could hack a "all norms are 1" behavior by
: writing something similar to OneNormsReader in
: org.apache.lucene.demo.SearchFiles.
Wow ... that's quite the little hidden gem in the Demo.
The gist of the impl is pretty much exactly what you need, but
SegmentReader has a static utilit
Hi Walt,
AFAIK there is no flag guiding scorers to "ignore norms".
I guess you could hack a "all norms are 1" behavior by
writing something similar to OneNormsReader in
org.apache.lucene.demo.SearchFiles.
Doron
Walt Stoneburner wrote on 01/06/2007 13:45:26:
> I've managed to build my own Simila
: What I'm trying to do is prevent Lucene from providing better ranking
: for documents that use a term multiple times than those that have more
: term hits.
:
: I've got some huge queries with quite a number of unique terms. I
: want the documents that hit more unique terms to float to the top,
Grant writes:
One question that comes to mind, is what are you looking to do?
What I'm trying to do is prevent Lucene from providing better ranking
for documents that use a term multiple times than those that have more
term hits.
I've got some huge queries with quite a number of unique terms.
The score normalization is actually more important for purposes of
review. It actually is possible that both D1 and D2 properly match to
F1. Some customers have repeats of the same film (e.g. Spiderman 2 and
Spiderman 2 in HD). When the system goes through and records the
potential matches, our r
I have no particular experience with matching
problems so the following might be off target...
Anyhow, if I understand correctly, problem is that,
currently, given a set of customer film descriptions
{D1, D2, ... , Dn}, a set of n queries are created
and each query can match at most one film in th
This may be a five year old explaining to a four year old why the sky
is blue, but I'll share some of the stuff I've picked up. :)
My application isn't so much a search engine as a matching engine. I
take a large list of movie documents from a customer like a movie
channel or a cable provider an
Hi Walt,
One question that comes to mind, is what are you looking to do? Are
you not happy with the current scoring or you just trying to better
understand scoring? The calls to Similarity.tf(), etc. are call
backs from within the scoring algorithm (have a look at TermScorer in
the code
On 5/30/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote:
a) Where does freq come from? (Not what is it, but who computes it and how?)
For a single term, it's determined at index time and stored in the index.
TermDocs gives you a list of documents containing the term, and for
each document, the