Erik, Thanks, I applied the changes found in version 150148 of StopFilter.java and they work great for me. I did remove the setting of position=1 before the return of the token since that seemed spurious to me. Here's a context diff of the current StopFilter.java and my changes:
*** analysis/StopFilter.java.old Thu Jun 16 07:42:28 2005 --- analysis/StopFilter.java Thu Jun 16 08:44:50 2005 *************** *** 94,109 **** * Returns the next input Token whose termText() is not a stop word. */ public final Token next() throws IOException { - int position = 1; - // return the first non-stop word found ! for (Token token = input.next(); token != null; token = input.next()) { ! if (!stopWords.contains(token.termText)) { ! token.setPositionIncrement( position ); return token; - } - position++; - } // reached EOS -- return null return null; } --- 94,103 ---- * Returns the next input Token whose termText() is not a stop word. */ public final Token next() throws IOException { // return the first non-stop word found ! for (Token token = input.next(); token != null; token = input.next()) ! if (!stopWords.contains(token.termText)) return token; // reached EOS -- return null return null; } Erik Hatcher wrote: > > On Jun 15, 2005, at 12:12 PM, Mike Barry wrote: > >> I have a situation where a query such as "climate control" is returning >> documents with the phrase "climate of control". (I'm using >> QueryParser). >> >> After searching, I found the similar issue on the mailing list from >> Greg Robertson >> with a patch from Steve Rowe. >> >> Looking at the source repository for StopFilter.java, the patch was >> applied >> in November of 2003 and then reverted in Dec 2003 (by Erik), with >> the note: >> >> revert position increment change due to conflict with PhraseQuery >> >> (the patch incremented the token position to inhibit exact matching >> across >> removed stopword(s)). >> >> I couldn't find any info on how/why this approach conflicted with >> PhraseQuery. >> Can anyone elighten me on this? Does anyone know of a way to inhibit >> exact matching across removed stopwords(s)? > > > PhraseQuery originally did not account for gaps left in the terms of > the phrase. > > PhraseQuery was modified last year to allow for this though: > > r150509 | goller | 2004-09-15 05:38:50 -0400 (Wed, 15 Sep 2004) | 5 > lines > > PhraseQuery and PhrasePrefixQuery are extended. It's now > possible to specify the relative position of a term within > a phrase. This allows gaps and multiple terms at the same > position. > ----- > > So we could change StopFilter to put the gaps back in safely now, I > think. > > Thoughts? > > Erik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]