Hi, we are having a discussion in java-dev@lucene.apache.org about implementing probabilistic language modelling approaches such as BM25 in Lucene. Hope you can join us there.
Jianhan -----Original Message----- From: beatriz ramos [mailto:[EMAIL PROTECTED] Sent: 19 October 2006 16:36 To: java-user@lucene.apache.org Cc: [EMAIL PROTECTED] Subject: Re: implementing our own Scorer (BM25) Excuse me, I don't want to write a very long email. This is the BM25 Scorer formule: log((N-f+0.5)/(f+0.5)) * (k1 + 1) * c / (c+k1*( (1-b)+b*l/L)) where N = total number of documents f = inverse frecuency (number of documents which contain the term) c = term frecuency in a document l = lenght of document L = average document lenght k1, b = constants I think f is the same as idf in default Lucene scorer formule and c is the same as tf. I implement BM25 Scorer formule in score method of BM25Scorer class (my own Scorer class that extends of Scorer class) public class BM25Scorer extends Scorer{ public BM25Scorer(Similarity similarity) { super(similarity); } } The problem is that I would have to implement my own Similarity class with some specific abstract methods like queryNorm(float sumOfSquaredWeights) but I don't know how to calculate sumOfSquaredWeights with the parameters of BM25 Scorer formule Do I have to change only Query, Weigth and Scorer class or I need to create my own Similarity class? Thanks On 19/10/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > Please provide more information about what you have done so far. > > On Oct 19, 2006, at 9:10 AM, beatriz ramos wrote: > > > Hello, > > I'm trying to implement my own scoring algorithm with Lucene but I > > don't get any results. > > > > Lucene documentation explains how to implement new scoring, > > modifying Query, Weight and Scorer classes. I have tried this but > > doesn't work > > > > Do you have any idea? > > I need some example to understand the process and modifications > > > > Thanks > > -------------------------- > Grant Ingersoll > Sr. Software Engineer > Center for Natural Language Processing Syracuse University > 335 Hinds Hall > Syracuse, NY 13244 > http://www.cnlp.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]