I'm not sure this specific detail (how IW uses Similarity) is documented -- best "documentation" is the source code ;)
Have a look at oal.index.NormsWriterPerField. That's where the default indexing chain asks Similarity to create the norm. Mike On Fri, Dec 18, 2009 at 5:12 AM, kdev <v.verro...@di.uoa.gr> wrote: > > The avg is used only in the idf method of the Similarity class. So I guess > there is workaround for what I want to do. Can you give me a reference, on > lucene doc, on how a IndexWriter uses the provided Similarity class? > > Thanks again for your time and your help. > > > Michael McCandless-2 wrote: >> >> IndexWriter uses Similarity.lengthNorm to create a norm (boost for the >> field, per document) based on the length of the field... it doesn't >> invoke the other methods on Similarity. >> >> Are you saying you need to know the avg across the whole corpus before >> computing that boost? >> >> Mike >> >> On Thu, Dec 17, 2009 at 10:50 AM, kdev <v.verro...@di.uoa.gr> wrote: >>> >>> If I follow your approach, and produce the avg(outside of Lucene) while I >>> 'm >>> building the index(due to performance reasons I can't wait for all the >>> documents to arrive before indexing them) for a collection, the avg will >>> be >>> ready only when all the documents of the collection are indexed. >>> Lucene states that the new similarity class must be set in >>> IndexWriter.setSimilarity(), and be used while I build the index, and in >>> this time the avg isn't ready yet. Is there a way to overcome this? And >>> if >>> not calculating the score while the index is being created, and only when >>> searching the index, what will the consequence in performance be? >>> >>> (Mike thank you about your response) >>> >>> >>> Michael McCandless-2 wrote: >>>> >>>> There have been some discussions, here: >>>> >>>> https://issues.apache.org/jira/browse/LUCENE-2091 >>>> >>>> about how Lucene could track avg field/doc length, but they are just >>>> brainstorming type discussions now. >>>> >>>> You could always do something approximate outside of Lucene? EG, make >>>> a TokenFilter that counts how many tokens are produced for each >>>> field/doc, aggregate & store that yourself, and use it in your >>>> similarity impl? >>>> >>>> Mike >>>> >>>> On Tue, Dec 15, 2009 at 5:04 AM, kdev <v.verro...@di.uoa.gr> wrote: >>>>> >>>>> any ideas please? >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26830145.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > -- > View this message in context: > http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26841521.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org