Hayden Muhl created LUCENE-5211:
-----------------------------------
Summary: StopFilterFactory does not honor comments
Key: LUCENE-5211
URL: https://issues.apache.org/jira/browse/LUCENE-5211
Project: Lucene - Core
Issue Type: Bug
Components: core/search
Affects Versions: 4.2
Reporter: Hayden Muhl
The StopFilterFactory builds a CharArraySet directly from the raw lines of the
supplied words file. This causes a problem when using the stop word files
supplied with the Solr/Lucene distribution. In particular, the comments in
those files get added to the CharArraySet. A line like this...
ceci | this
Should result in the string "ceci" being added to the CharArraySet, but "ceci
| this" is what actually gets added.
Workaround: Remove all comments from stop word files you are using.
Suggested fix: The StopFilterFactory should strip any comments, then strip
trailing whitespace. The stop word files supplied with the distribution should
be edited to conform to the supported comment format.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]