Hi Uwe, Thanks for clarifying and the link given by you does have a satisfactory explanation.
So in a business scenario where we have to make a decision based on the "accepted" matching of a document (say perform activity A only when a document matches more than 50%), we wont be able to rely on the match score because the score will change based on our query and some times 80% matching may not be as close as 5% matching with a slightly different query. (I know I am going back to % again :) So how do we handle such a scenario? Thanks Saurabh On Wed, Aug 3, 2011 at 1:34 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi Saurabh, > > > > There is nothing wrong with Lucene, the problem is generally that you try > to > see scores as percentages, which they aren't. Scores are arbitrary values, > only used for sorting search results, but never to compare results between > different queries. It's in fact easy possible to also get back values >1.0. > > Your examples do the right thing, the sorting is the same in both cases. > The > actual score values are *arbitrary*! > > > > See <http://wiki.apache.org/lucene-java/ScoresAsPercentages> > http://wiki.apache.org/lucene-java/ScoresAsPercentages for explanation. > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de <http://www.thetaphi.de/> > > eMail: u...@thetaphi.de > > > > From: Saurabh Gokhale [mailto:saurabhgokh...@gmail.com] > Sent: Wednesday, August 03, 2011 12:39 AM > To: java-user@lucene.apache.org > Subject: Multiple Query clauses impacting result > > > > Hi All, > > > > As I add new clauses to the Boolean Query, my queryNorm value goes down > which is impacting the results. > > > > > > > > For example: (The complete stand alone application attached with the email. > I am using Lucene 3.1.0) > > > > I indexed following 6 documents > > > > addDoc("author1", "My first book", "123"); --> 1st column == author name, > 2nd = subject, 3rd column = isbn # > > addDoc("author2", "My next book", "333"); > > addDoc("author2", "this first text", "444"); > > addDoc("author3", "test the knowledge", "456"); > > addDoc("author4", "knowledge is vertue", "789"); > > addDoc("author5", "saurabh", "222"); > > > > The Boolean Query given below generates following result: > > > > Query = (author:author1) (subject:book subject:first subject:my) -isbn:123 > > Match: 26.498592% || Doc Author: author2 || Doc subject: My next book || > Doc ISBN: 333 > > Match: 8.280809% || Doc Author: author2 || Doc subject: this first text || > Doc ISBN: 444 > > > > Now to this boolean Query if I add a new query, in this case a spannear > Query with the search values which does not exists, my result percentage > goes down. > > > > Query = (author:author1) (subject:book subject:first subject:my) -isbn:123 > spanNear([subject:not, subject:found], 3, true) > > Match: 9.584372% || Doc Author: author2 || Doc subject: My next book || > Doc > ISBN: 333 > > Match: 2.995116% || Doc Author: author2 || Doc subject: this first text || > Doc ISBN: 444 > > > > Now the problem is, same documents which matched with 26 and 8 percentile > in > the first query result, now matched with 9 and 2 percentile. Ideally I do > not expect any change in the result percentage as all my clauses are with > Boolean OR parameter. But due to the queryNorm factor getting updated due > to > the addition of new clause, my result is getting impacted. (You can see the > complete code in the attached java file) > > > > Now in a scenario where my job is to find if 100 special words (either > single words or combination of multiple words) are present in the document > or no, my result will go way down because not all documents will have those > words and my queryNorm will be way low due to addition of 99 OR Boolean > clauses. > > > > Is there a way I can get consistent result regardless of the OR clauses I > add to my query? I mean is there a way I can control the queryNorm if this > is what is the root cause? > > > > Thanks > > > > Saurabh > >