Hello Lucene experts, I am working on a perhaps interesting problem. I am using Lucene as an IR engine that allows users to search for documents. Additioanlly I use a user model that produces a second score. This second score represents a different aspect of document relevance based on data from a previous experiment. This score however is based on other data then the content and has nothing to do with the TF/IDF formula used in the Lucene engine. You may think of it as an additional model that also produces a score but is based on anther view on relevance and more statically grounded on user oppinion rather than deterministic on word counts and distribution of words over the document collection.
My aim is to combine this two scores. The Lucenes score is normalisied between 0.0 and 1.0 (if the score exceeded 1.0 at some point) or less then 1.0 (if it did not). The user model looks the same in this perspective - although based on different data - a 1.0 means the maximum of relevance and a 0.0 a minimum or relevance. At the moment I am multiplying the Lucene score with the score produced by the user model. This means the resulting, combiend socre is number between 0.0 and 1.0 and represents the merged view from both models - the IR view and the view of the user model. Regrading this, I have a question: Multiplying both scores seemed obvious to me until recently. Mainly because I have seen it before and because it seemed to deliver good results based on initial testing. But this is a weak assumption and I am nervious when it comes to the mathematical foundation or at least to a decent justification about it. Does somebody here know similar work or has worked on similar issues and can share some ideas or perhaps point me to some papers that address this issues. I would be interested to discuss the issue of score combination in general. My particular problem is that I do no stay in the pure IR field (that is covered by IR literature) but combine with other models. More generally, I would like to know your oppion about if this is a good idea or not. The technical fact requiremes me somehow to combine this two scores that are explicitly independent. I am now looking for a decent way to do that so that the meaning of the TF/IDF model is not violated. Mathmatically I have to find the function f(L,U) where L is the Lucene TF/IDF model and U my user model. Kind Regards, Karl -- Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat, DSL-Flatrate für nur 4,99 Euro/Monat* http://www.gmx.net/de/go/dsl --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]