: So in a business scenario where we have to make a decision based on the : "accepted" matching of a document (say perform activity A only when a : document matches more than 50%), we wont be able to rely on the match score : because the score will change based on our query and some times 80% matching : may not be as close as 5% matching with a slightly different query. (I know : I am going back to % again :) : : So how do we handle such a scenario?
you have to redefine your criteria. "50% match" is meaninless -- you have to decide what that means: does it mean matching half of the clauses in a boolean query? what if a doc matches only 1/3 of the clauses, but it matches them 100 times each? what if it matches 1/2 the clauses, 100 times each, but that only makes up a tiny fraction of the total terms in thta document (ie: it's got the entire contents of wikipedia in every field)? what if the query isn't a boolean query but a phrase query? if you have a constrained set of possible queries, and you can define precisesly what rules you care about, you can modify your similarity class such that regardless of the index to produces scores that you *can* use to make inferences about given your rules. See Also... http://www.gossamer-threads.com/lists/lucene/java-user/61075 http://markmail.org/thread/3svvskbay4hpqyms http://markmail.org/message/lztdm4xosmceup5t And a real oldy but goodie... http://markmail.org/message/5eipstcu6lky2h2j -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org