Re: The values which compute scores. - Part II

2007-06-03 Thread Chris Hostetter
: I guess you could hack a "all norms are 1" behavior by : writing something similar to OneNormsReader in : org.apache.lucene.demo.SearchFiles. Wow ... that's quite the little hidden gem in the Demo. The gist of the impl is pretty much exactly what you need, but SegmentReader has a static utilit

Re: The values which compute scores. - Part II

2007-06-01 Thread Doron Cohen
Hi Walt, AFAIK there is no flag guiding scorers to "ignore norms". I guess you could hack a "all norms are 1" behavior by writing something similar to OneNormsReader in org.apache.lucene.demo.SearchFiles. Doron Walt Stoneburner wrote on 01/06/2007 13:45:26: > I've managed to build my own Simila

Re: The values which compute scores.

2007-05-31 Thread Chris Hostetter
: What I'm trying to do is prevent Lucene from providing better ranking : for documents that use a term multiple times than those that have more : term hits. : : I've got some huge queries with quite a number of unique terms. I : want the documents that hit more unique terms to float to the top,

Re: The values which compute scores.

2007-05-31 Thread Walt Stoneburner
Grant writes: One question that comes to mind, is what are you looking to do? What I'm trying to do is prevent Lucene from providing better ranking for documents that use a term multiple times than those that have more term hits. I've got some huge queries with quite a number of unique terms.

Re: The values which compute scores.

2007-05-31 Thread Daniel Einspanjer
The score normalization is actually more important for purposes of review. It actually is possible that both D1 and D2 properly match to F1. Some customers have repeats of the same film (e.g. Spiderman 2 and Spiderman 2 in HD). When the system goes through and records the potential matches, our r

Re: The values which compute scores.

2007-05-31 Thread Doron Cohen
I have no particular experience with matching problems so the following might be off target... Anyhow, if I understand correctly, problem is that, currently, given a set of customer film descriptions {D1, D2, ... , Dn}, a set of n queries are created and each query can match at most one film in th

Re: The values which compute scores.

2007-05-30 Thread Daniel Einspanjer
This may be a five year old explaining to a four year old why the sky is blue, but I'll share some of the stuff I've picked up. :) My application isn't so much a search engine as a matching engine. I take a large list of movie documents from a customer like a movie channel or a cable provider an

Re: The values which compute scores.

2007-05-30 Thread Grant Ingersoll
Hi Walt, One question that comes to mind, is what are you looking to do? Are you not happy with the current scoring or you just trying to better understand scoring? The calls to Similarity.tf(), etc. are call backs from within the scoring algorithm (have a look at TermScorer in the code

Re: The values which compute scores.

2007-05-30 Thread Yonik Seeley
On 5/30/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote: a) Where does freq come from? (Not what is it, but who computes it and how?) For a single term, it's determined at index time and stored in the index. TermDocs gives you a list of documents containing the term, and for each document, the