Re: QueryParser, phrases and stopwords

Erik Hatcher Thu, 16 Jun 2005 07:24:13 -0700

Are there any other issues or concerns with making this change toStopFilter? Should we make this change in 1.9? Or wait until after2.0 is released?

Mike - if you could create some test cases for this scenario andcontribute your patch and tests to Bugzilla, barring no objections,I'll apply it.


    Erik


On Jun 16, 2005, at 8:57 AM, Mike Barry wrote:

Erik,

Thanks, I applied the changes found in version 150148 ofStopFilter.javaand they work great for me. I did remove the setting of position=1beforethe return of the token since that seemed spurious to me. Here's acontext

diff of the current StopFilter.java and my changes:

*** analysis/StopFilter.java.old        Thu Jun 16 07:42:28 2005
--- analysis/StopFilter.java    Thu Jun 16 08:44:50 2005
***************
*** 94,109 ****

* Returns the next input Token whose termText() is not a stopword.

     */
    public final Token next() throws IOException {
-     int position = 1;
-
      // return the first non-stop word found
!     for (Token token = input.next(); token != null; token =
input.next()) {
!       if (!stopWords.contains(token.termText)) {
!         token.setPositionIncrement( position );
          return token;
-       }
-       position++;
-     }
      // reached EOS -- return null
      return null;
    }
--- 94,103 ----

* Returns the next input Token whose termText() is not a stopword.

     */
    public final Token next() throws IOException {
      // return the first non-stop word found

! for (Token token = input.next(); token != null; token =input.next())

!       if (!stopWords.contains(token.termText))
          return token;
      // reached EOS -- return null
      return null;
    }




Erik Hatcher wrote:


On Jun 15, 2005, at 12:12 PM, Mike Barry wrote:

I have a situation where a query such as "climate control" isreturning

documents with the phrase "climate of control".  (I'm using
QueryParser).

After searching, I found  the similar issue on the mailing list from
Greg Robertson
with a patch from Steve Rowe.

Looking at the source repository for StopFilter.java, the patch was
applied
in November of 2003 and then reverted in Dec 2003 (by Erik), with
the note:

revert position increment change due to conflict with PhraseQuery

(the patch incremented the token position to inhibit exactmatching

across
removed stopword(s)).

I couldn't find any info on how/why this approach conflicted with
PhraseQuery.
Can anyone elighten me on this? Does anyone know of a way to inhibit
exact matching across removed stopwords(s)?



PhraseQuery originally did not account for gaps left in the terms of
the phrase.

PhraseQuery was modified last year to allow for this though:

r150509 | goller | 2004-09-15 05:38:50 -0400 (Wed, 15 Sep 2004) | 5
lines

PhraseQuery and PhrasePrefixQuery are extended. It's now
possible to specify the relative position of a term within
a phrase. This allows gaps and multiple terms at the same
position.
-----

So we could change StopFilter to put the gaps back in safely now, I
think.

Thoughts?

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: QueryParser, phrases and stopwords

Reply via email to