Re: Query Expansion Module for Lucene based on BM25 ranking function

2008-10-23 Thread Joaquin Perez Iglesias
Hi Grant and Jose, just to give some more details, as Jose said avg_length is precalculated at indexing time using an specific Similarity class. Basically this can be done through the lengthNorm method, for each document and field the total length is stored, when the indexing process is finish

Re: BM25 Scoring Patch

2010-02-16 Thread Joaquin Perez Iglesias
Hi Ivan, You shouldn't set the BM25Similarity for indexing or searching. Please try removing the lines: writer.setSimilarity(new BM25Similarity()); searcher.setSimilarity(sim); Please let us/me know if you improve your results with these changes. Robert Muir escribió: Hi Ivan, I've seen

Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
t;> Date: Tuesday, February 16, 2010, 11:36 AM >> yes Ivan, if possible please report >> back any findings you can on the >> experiments you are doing! >> >> On Tue, Feb 16, 2010 at 11:22 AM, Joaquin Perez Iglesias >> < >> joaquin.pe...@lsi.uned.es&

Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
t; >> >> >> --- On Tue, 2/16/10, Robert Muir wrote: >> >>> From: Robert Muir >>> Subject: Re: BM25 Scoring Patch >>> To: java-user@lucene.apache.org >>> Date: Tuesday, February 16, 2010, 11:36 AM >>> yes Ivan, if possible p

Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
t; Note: I have no bias against BM-25, but its definitely a myth to say there > is a single retrieval formula that is the 'best' across the board. > > > On Tue, Feb 16, 2010 at 1:53 PM, JOAQUIN PEREZ IGLESIAS < > joaquin.pe...@lsi.uned.es> wrote: > >> By the w

Re: BM25 Scoring Patch

2010-02-16 Thread JOAQUIN PEREZ IGLESIAS
best if we > can > support other models also! > > Finally I think there is something to be said for Lucene's default > retrieval > model, which in my (non-english) findings across the board isn't terrible > at > all... then again I am working with languages