The way I've always done this was to index two fields: say, "contents" and "contents_unstemmed", (using a PerFieldAnalyzer) and then query on both of them. This has the double effect of a) boosting unstemmed hits, because every unstemmed match is also a stemmed one, so the BooleanQuery combining the stemmed and unstemmed queries gets higher weight in this case; and b) it allows you to query by *only* the unstemmed variant if e.g. the user puts their search term in quotes, indicating they really want an exact match.
-jake On 2/11/08, Michael Stoppelman <[EMAIL PROTECTED]> wrote: > Hi all, > I've got an index with tokens that are stemmed. Sometimes I really need to > boost the unstemmed > version of a query word to get the most relevant documents. > > Example: > Query: [olives]. > > I don't want to match documents with the words: oliver, oliver's, etc... > > Since I'm stemming when creating the index is there a way to store both > versions (stemmed/unstemmed) with > setIncrementPosition()? Is that the correct way to deal with this? I was > reading old archives and this didn't seem > to be a great way decision since it breaks PhraseQuery [1]. > > It seems like it would be useful if at query scoring time if I could see the > original string values of the tokens in this case > at least. > > Thanks in advance, > > -M > > [1] http://www.mail-archive.com/[EMAIL PROTECTED]/msg07416.html > -- Sent from Gmail for mobile | mobile.google.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]