[
https://issues.apache.org/jira/browse/LUCENE-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Woodward updated LUCENE-7444:
----------------------------------
Summary: Remove English stopwords default from StandardAnalyzer in
Lucene-Core (was: Remove StopFilter from StandardAnalyzer in Lucene-Core)
> Remove English stopwords default from StandardAnalyzer in Lucene-Core
> ---------------------------------------------------------------------
>
> Key: LUCENE-7444
> URL: https://issues.apache.org/jira/browse/LUCENE-7444
> Project: Lucene - Core
> Issue Type: Task
> Components: core/other, modules/analysis
> Affects Versions: 6.2
> Reporter: Uwe Schindler
> Priority: Major
> Attachments: LUCENE-7444.patch
>
>
> Yonik said on LUCENE-7318:
> {quote}
> bq. I think it would make a good default for most Lucene users, and we should
> graduate it from the analyzers module into core, and make it the default for
> IndexWriter.
> This "StandardAnalyzer" is specific to English, as it removes English
> stopwords.
> That seems to be an odd choice now for a few reasons:
> - It was argued in the past (rather vehemently) that Solr should not prefer
> english in it's default "text" field
> - AFAIK, removing stopwords is no longer considered best practice.
> Given that removal of english stopwords is the only thing that really makes
> this analyzer english-centric (and given the negative impact that can have on
> other languages), it seems like the stopword filter should be removed from
> StandardAnalyzer.
> {quote}
> When trying to fix the backwards incompatibility issues in LUCENE-7318, it
> looks like most unrelated code moved from analysis module to core (and
> changing package names!!!! :( ) was related to word list loading,
> CharArraySets, and superclasses of StopFilter. If we follow Yonik's
> suggestion, we can revert all those changes. I agree with hin, an "universal"
> analyzer should not have any language specific stop-words.
> The other thing is LowercaseFilter, but I'd suggest to simply add a clone of
> it to Lucene core and leave the analysis-module self-contained.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]