Are there any other issues or concerns with making this change to
StopFilter? Should we make this change in 1.9? Or wait until after
2.0 is released?
Mike - if you could create some test cases for this scenario and
contribute your patch and tests to Bugzilla, barring no objections,
I'll apply it.
Erik
On Jun 16, 2005, at 8:57 AM, Mike Barry wrote:
Erik,
Thanks, I applied the changes found in version 150148 of
StopFilter.java
and they work great for me. I did remove the setting of position=1
before
the return of the token since that seemed spurious to me. Here's a
context
diff of the current StopFilter.java and my changes:
*** analysis/StopFilter.java.old Thu Jun 16 07:42:28 2005
--- analysis/StopFilter.java Thu Jun 16 08:44:50 2005
***************
*** 94,109 ****
* Returns the next input Token whose termText() is not a stop
word.
*/
public final Token next() throws IOException {
- int position = 1;
-
// return the first non-stop word found
! for (Token token = input.next(); token != null; token =
input.next()) {
! if (!stopWords.contains(token.termText)) {
! token.setPositionIncrement( position );
return token;
- }
- position++;
- }
// reached EOS -- return null
return null;
}
--- 94,103 ----
* Returns the next input Token whose termText() is not a stop
word.
*/
public final Token next() throws IOException {
// return the first non-stop word found
! for (Token token = input.next(); token != null; token =
input.next())
! if (!stopWords.contains(token.termText))
return token;
// reached EOS -- return null
return null;
}
Erik Hatcher wrote:
On Jun 15, 2005, at 12:12 PM, Mike Barry wrote:
I have a situation where a query such as "climate control" is
returning
documents with the phrase "climate of control". (I'm using
QueryParser).
After searching, I found the similar issue on the mailing list from
Greg Robertson
with a patch from Steve Rowe.
Looking at the source repository for StopFilter.java, the patch was
applied
in November of 2003 and then reverted in Dec 2003 (by Erik), with
the note:
revert position increment change due to conflict with PhraseQuery
(the patch incremented the token position to inhibit exact
matching
across
removed stopword(s)).
I couldn't find any info on how/why this approach conflicted with
PhraseQuery.
Can anyone elighten me on this? Does anyone know of a way to inhibit
exact matching across removed stopwords(s)?
PhraseQuery originally did not account for gaps left in the terms of
the phrase.
PhraseQuery was modified last year to allow for this though:
r150509 | goller | 2004-09-15 05:38:50 -0400 (Wed, 15 Sep 2004) | 5
lines
PhraseQuery and PhrasePrefixQuery are extended. It's now
possible to specify the relative position of a term within
a phrase. This allows gaps and multiple terms at the same
position.
-----
So we could change StopFilter to put the gaps back in safely now, I
think.
Thoughts?
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]