Re: stemmed search and exact match on "same" field

Robert Watkins Tue, 15 Aug 2006 03:54:10 -0700

Thank you, Chris. You have confirmed what I had all but resigned myself
to (and you summarized my goal precisely). I am sticking with the two
versions of the field and just accepting the fact that the search
clients will need to use my custom query parser.


Even if one doesn't get the answer one wants, it's at least comforting
to get a definitive answer so that the speculation can stop and the work
can get done.

thanks again,
-- Robert

On Mon, 14 Aug 2006, Chris Hostetter wrote:


Therre's a lot of information in your email, and a lot of questions that
relate to similar topics and address different ways of acomplishing
similar but different things ... too much for me to digest
all at once, so lemme start by seeing if i can summarize your goal, and
then give you my suggestion based on the goal as i see it...

You want simple term matches to be "stemmed" but you want phrase ueries to
be "unstemmed"

so if i user queries for the word...
        jumped
...you want that to match any of the words: jump, jumps, jumped, etc...

if a user queries for...
        "the dogs"
...you want that to only match the exact phrase and not something with the
tokens "the dog"

you want these ideas to work, even if phrases and terms are mixed in
the users query...
        foo:jumped bar:"the dogs"

My first though is that you kepe using two versions of hte field (one
stemmed and one unstemmed) and you then subclass QueryParser and override
the getFieldQuery(String field, String queryText) method ... if the second
arg looks like a phrase to you (ie: it has spaces or what not) them return
super.getField(field, queryText).  If it's not a phrase, then call
super.getField(field + "_STEMMED", queryText).

where this breaks down is if you want the non-stemmed behavior even if hte
users "phrase" only contains one word, ie...
        foo:jumped bar:"dogs"
...because the information that "dogs" was in quotes is lost by the time
getFieldQuery is called.  You'd have to write a lot more QueryParsing code
to get that behavior.


In general, for your goal, i would not attempt to put both teh stemmed and
unstemmed tokens in the same field -- because as i think you mentioned,
there is not way to tell them apart at query time.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: stemmed search and exact match on "same" field

Reply via email to