Are there any other issues or concerns with making this change to StopFilter? Should we make this change in 1.9? Or wait until after 2.0 is released?

Mike - if you could create some test cases for this scenario and contribute your patch and tests to Bugzilla, barring no objections, I'll apply it.

    Erik


On Jun 16, 2005, at 8:57 AM, Mike Barry wrote:

Erik,
Thanks, I applied the changes found in version 150148 of StopFilter.java and they work great for me. I did remove the setting of position=1 before the return of the token since that seemed spurious to me. Here's a context
diff of the current StopFilter.java and my changes:

*** analysis/StopFilter.java.old        Thu Jun 16 07:42:28 2005
--- analysis/StopFilter.java    Thu Jun 16 08:44:50 2005
***************
*** 94,109 ****
* Returns the next input Token whose termText() is not a stop word.
     */
    public final Token next() throws IOException {
-     int position = 1;
-
      // return the first non-stop word found
!     for (Token token = input.next(); token != null; token =
input.next()) {
!       if (!stopWords.contains(token.termText)) {
!         token.setPositionIncrement( position );
          return token;
-       }
-       position++;
-     }
      // reached EOS -- return null
      return null;
    }
--- 94,103 ----
* Returns the next input Token whose termText() is not a stop word.
     */
    public final Token next() throws IOException {
      // return the first non-stop word found
! for (Token token = input.next(); token != null; token = input.next())
!       if (!stopWords.contains(token.termText))
          return token;
      // reached EOS -- return null
      return null;
    }




Erik Hatcher wrote:



On Jun 15, 2005, at 12:12 PM, Mike Barry wrote:


I have a situation where a query such as "climate control" is returning
documents with the phrase "climate of control".  (I'm using
QueryParser).

After searching, I found  the similar issue on the mailing list from
Greg Robertson
with a patch from Steve Rowe.

Looking at the source repository for StopFilter.java, the patch was
applied
in November of 2003 and then reverted in Dec 2003 (by Erik), with
the note:

revert position increment change due to conflict with PhraseQuery

(the patch incremented the token position to inhibit exact matching
across
removed stopword(s)).

I couldn't find any info on how/why this approach conflicted with
PhraseQuery.
Can anyone elighten me on this? Does anyone know of a way to inhibit
exact matching across removed stopwords(s)?



PhraseQuery originally did not account for gaps left in the terms of
the phrase.

PhraseQuery was modified last year to allow for this though:

r150509 | goller | 2004-09-15 05:38:50 -0400 (Wed, 15 Sep 2004) | 5
lines

PhraseQuery and PhrasePrefixQuery are extended. It's now
possible to specify the relative position of a term within
a phrase. This allows gaps and multiple terms at the same
position.
-----

So we could change StopFilter to put the gaps back in safely now, I
think.

Thoughts?

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to