Others can correct me if I am wrong, but I don't think a "pure" Rochio feedback loop is possible in the current state, since Lucene doesn't currently support negative boosts (http://lucene.apache.org/java/docs/queryparsersyntax.html). Having said that, what we do, in a nutshell is similar to what you describe: For the positive examples, store the terms and a boost factor. The boost factor is the frequency of the term across all the positive examples multiplied by beta. Then for the negative examples, decrement the boost factor by gamma times the frequency of the term in all the negative examples. Remove any terms that have a boost of zero or less.

In the end, you construct a new query out of the terms and boosts that you can submit. I think it is more of an approximation of Rochio, but have had good results from it. You also probably want to limit the number of terms per document you add, at least if you are concerned about performance. -Grant

Stefan Gusenbauer wrote:

I've some thoughts about Lucene and Relevance Feedback. I want to implement some variation of the Roccio Formula and there is the problem.
The formula is like this:

Query(new) = alpha * Query(old) + beta * Sum(Relevant Documents) - gamma * Sum(Non Relevant Documents)

The relevant documents in this formula should be in a vector representation. This is the problem If I work with TermFreqVectors then the vectors are not equally long and contains different terms. My solution now is to take the TermFreqVectors and minimize them to the least common multiple and perform then the computation.
So my questions are:
Is this the only way to do so? ( I hope so not)
Is there an add on for lucene to get a real vector representation?
Does anyone has experiences with this issue?
Thanks
Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
------------------------------------------------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 337 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to