Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread Michael McCandless
I'm not sure this specific detail (how IW uses Similarity) is documented -- best "documentation" is the source code ;) Have a look at oal.index.NormsWriterPerField. That's where the default indexing chain asks Similarity to create the norm. Mike On Fri, Dec 18, 2009 at 5:12 AM, kdev wrote: > >

Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread kdev
The avg is used only in the idf method of the Similarity class. So I guess there is workaround for what I want to do. Can you give me a reference, on lucene doc, on how a IndexWriter uses the provided Similarity class? Thanks again for your time and your help. Michael McCandless-2 wrote: > > I

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread Michael McCandless
IndexWriter uses Similarity.lengthNorm to create a norm (boost for the field, per document) based on the length of the field... it doesn't invoke the other methods on Similarity. Are you saying you need to know the avg across the whole corpus before computing that boost? Mike On Thu, Dec 17, 200

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread kdev
If I follow your approach, and produce the avg(outside of Lucene) while I 'm building the index(due to performance reasons I can't wait for all the documents to arrive before indexing them) for a collection, the avg will be ready only when all the documents of the collection are indexed. Lucene s

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread Michael McCandless
There have been some discussions, here: https://issues.apache.org/jira/browse/LUCENE-2091 about how Lucene could track avg field/doc length, but they are just brainstorming type discussions now. You could always do something approximate outside of Lucene? EG, make a TokenFilter that counts

Re: Scoring formula - Average number of terms in IDF

2009-12-15 Thread kdev
any ideas please? -- View this message in context: http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsu

Re: scoring formula

2006-08-04 Thread Zhao, Xin
Hi, Erik, What do you think about the difference? Thank you very much for your reply, Xin - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: Sent: Wednesday, August 02, 2006 3:56 PM Subject: Re: scoring formula Please disregard my previous quick re

Re: scoring formula

2006-08-02 Thread Erik Hatcher
Please disregard my previous quick reply as I did not fully read your message before replying. *ugh* Erik On Aug 2, 2006, at 2:32 PM, Zhao, Xin wrote: Hi, I noticed the scoring formula in the errata of book "Lucene in Action" is a little different from the one in Javadoc. I enclos

Re: scoring formula

2006-08-02 Thread Erik Hatcher
Xin, You're correct. This was noted as an errata here: www.lucenebook.com/blog/errata/2005/01/24/scoring_formula_omission.html> All other known errata is here: (and searchable via Lucene, as in

Re: Scoring formula

2005-11-05 Thread Otis Gospodnetic
case? > > Karl > > > --- Ursprüngliche Nachricht --- > > Von: Yonik Seeley <[EMAIL PROTECTED]> > > An: java-user@lucene.apache.org > > Betreff: Re: Scoring formula > > Datum: Sat, 5 Nov 2005 17:49:40 -0500 > > > > Lucene 1.2 is before my time, but check if

Re: Scoring formula

2005-11-05 Thread Otis Gospodnetic
s the score is always > between > 0.0 and 1.0 (without any boosting)... Is this the case? > > Karl > > > --- Ursprüngliche Nachricht --- > > Von: Otis Gospodnetic <[EMAIL PROTECTED]> > > An: java-user@lucene.apache.org > > Betreff: Re: Scoring formula &g

Re: Scoring formula

2005-11-05 Thread Karl Koch
I always thought that Lucene search is always returning a Hits object. In what occation would this not be the case? Karl > --- Ursprüngliche Nachricht --- > Von: Yonik Seeley <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: Scoring formula > Datum: Sat

Re: Scoring formula

2005-11-05 Thread Yonik Seeley
Lucene 1.2 is before my time, but check if the functions are implemented the same as the current version (they probably are). Scores are not naturally <= 1, but for most search methods (including all that return Hits) they are normalized to be between 1 and 0 if the highest score is greater than 1

Re: Scoring formula

2005-11-05 Thread Karl Koch
> --- Ursprüngliche Nachricht --- > Von: Otis Gospodnetic <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: Scoring formula > Datum: Fri, 4 Nov 2005 12:12:52 -0800 (PST) > > The formula should also be in the javadoc for Similarity class, if it > was there in 1.2. &

Re: Scoring formula

2005-11-04 Thread Otis Gospodnetic
The formula should also be in the javadoc for Similarity class, if it was there in 1.2. Otis --- Karl Koch <[EMAIL PROTECTED]> wrote: > Hello group, > > the scoring formula for Lucene is well explained in "Lucene in > Action". > However, is this formula also valid for Lucene 1.2 (which I am >