On Fri, Nov 13, 2009 at 3:35 PM, Max Lynch <ihas...@gmail.com> wrote:
> > query: "San Francisco" "California" +("John Smith" "John Smith > > Manufacturing") > > > > Here the San Fran and CA clauses are optional, and the ("John Smith" OR > > "John Smith Manufacturing") is required. > > > > Thanks Jake, that works nicely. > > Now, I would like to know exactly what term was found. For example, if a > result comes back from the query above, how do I know whether John Smith > was > found, or both John Smith and his company, or just John Smith Manufacturing > was found? In general, this is actually very hard. Lucene does not even keep track itself of which terms in a given query matched a given document, but you really just need to know which terms matched in the final "top hits" you're showing to the user, right? What is this information used for / why do you want to know which term hit? -jake > The way I am doing that right now is using a highlighter (which > unfortunately breaks up "John Smith" into <b>John</b><b>Smith</b>) and > combining the terms that are to be highlighted and keeping track of them so > I know they were found. If there was a simple way to just check which part > of that query was matched that would be awesome. This is why I was > thinking > of using the term boosting and using a threshold to say "Well, if the score > is above this value, then I can assume that "John Smith" was found, but if > the score is under a certain threshold, I can say that only his company was > found", without having to use the highlighter and noting when a term I'm > looking for is to be highlighted. Is there a solution? > > Thanks, > Max >