The avg is used only in the idf method of the Similarity class. So I guess there is workaround for what I want to do. Can you give me a reference, on lucene doc, on how a IndexWriter uses the provided Similarity class?
Thanks again for your time and your help. Michael McCandless-2 wrote: > > IndexWriter uses Similarity.lengthNorm to create a norm (boost for the > field, per document) based on the length of the field... it doesn't > invoke the other methods on Similarity. > > Are you saying you need to know the avg across the whole corpus before > computing that boost? > > Mike > > On Thu, Dec 17, 2009 at 10:50 AM, kdev <v.verro...@di.uoa.gr> wrote: >> >> If I follow your approach, and produce the avg(outside of Lucene) while I >> 'm >> building the index(due to performance reasons I can't wait for all the >> documents to arrive before indexing them) for a collection, the avg will >> be >> ready only when all the documents of the collection are indexed. >> Lucene states that the new similarity class must be set in >> IndexWriter.setSimilarity(), and be used while I build the index, and in >> this time the avg isn't ready yet. Is there a way to overcome this? And >> if >> not calculating the score while the index is being created, and only when >> searching the index, what will the consequence in performance be? >> >> (Mike thank you about your response) >> >> >> Michael McCandless-2 wrote: >>> >>> There have been some discussions, here: >>> >>> https://issues.apache.org/jira/browse/LUCENE-2091 >>> >>> about how Lucene could track avg field/doc length, but they are just >>> brainstorming type discussions now. >>> >>> You could always do something approximate outside of Lucene? EG, make >>> a TokenFilter that counts how many tokens are produced for each >>> field/doc, aggregate & store that yourself, and use it in your >>> similarity impl? >>> >>> Mike >>> >>> On Tue, Dec 15, 2009 at 5:04 AM, kdev <v.verro...@di.uoa.gr> wrote: >>>> >>>> any ideas please? >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26830145.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- View this message in context: http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26841521.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org