Re: Proximity and Percentage match search in Lucene

Chris Hostetter Tue, 28 Apr 2009 15:44:15 -0700

Radha: replying/reforwarding the same message over and over doesn't tend 
to be a useful way to encourage additional replies.  if you do have 
something to add to an existing discussion that you've started, you should 
at least do it as a reply to the orriginal discussion so people have the 
full context...
  http://www.nabble.com/Need-help-%3A-SpanNearQuery-to23077372.html


I'm really not sure that anyone has any new additional info to offer you.  
this is a particularly hard problem, that doesn't have a very efficient 
solution that i know of.  it sounds like you've already tried the obvious 
solution, but aren't happy with the scores produced -- if you can't get 
the scores you want by tweaking the Similarity options available, then 
implementing custom Query/Scorer classes is really the only remaining 
option.


: Date: Sun, 19 Apr 2009 17:52:27 +0530
: From: Radha Sreedharan <radh...@gmail.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Proximity and Percentage match search in Lucene
: 
:  What I need is  the following :
: If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).
: 
:  Given the following :
:  I should get a hit even if all of the search tokens aren't present
:  If the tokens are found they should be found within a distance x of
: each other ( proximity
: search)
: >
: > I need the percentage match of the search tokens with the document field.
: >
: > Currently this is my query :
: > 1) I form all possible permutation of the search tokens
: > 2) do a spanNearQuery of each permutation
: > 3)  Do a DisjunctionMaxQuery on the spannearqueries.
: >
: > This is how I compute % match  :
: > % match =  ( Score by running the query on the document field ) /
: >             ( score by running the query on a document field created out of 
search tokens )
: >
: > The numerator gives me the actual score with the search tokens run on the 
field.
: > Denominator gives  me the  best possible or maximum possible score with the 
current search
: tokens
: >
: > For this example << If my document field is ( ab,bc,cd,ef) and Search 
tokens are
: (ab,bc,cd).>> I expect a % match of around 90%.
: >
: > However I get a match of only around 50% without a boost. Using a boost 
infact reduces
: my percentage.
: >
: > I even overrode the queryNorm method to return a one, still the percentage 
did not increase.
: *
: Is there any way of implementing this using the current set of
: implementation classes in Lucene and not making complex changes to the
: structure by itself.
: ( which is what i gather has to be done from the previous replies)
: 
: Can anyone suggest an alternative way of implementing this requirement
: using the existing bunch of classes in Lucene and not necessarily
: using the ones I have used*
: 



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Proximity and Percentage match search in Lucene

Reply via email to