Thanks, Eric. I created https://issues.apache.org/jira/browse/SOLR-17346 and opened a PR against it.
Alastair On Tue, 11 Jun 2024 at 14:02, Eric Pugh <ep...@opensourceconnections.com> wrote: > It makes sense to me to keep the two sets aligned! Please do open a JIRA > and a PR. > > > On Jun 11, 2024, at 5:42 AM, Alastair Porter <alast...@porter.net.nz> > wrote: > > > > Hello, > > I see that the stopwords_fr.txt list included in the solr default > configset > > is "out of date" compared to the same file in lucene. > > > https://github.com/apache/solr/blob/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt > > > https://github.com/apache/lucene/blame/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt > > > > Specifically, I was running into issues where I am trying to index été > > (summer), and it was being removed due to it also being a conjugation of > > être ("to be"). > > It appears that the snowball list ( > > https://snowballstem.org/algorithms/french/stop.txt) has been updated to > > resolve this specific issue, and by looking at the commit history in the > > lucene repository this happened many years ago ( > > https://issues.apache.org/jira/browse/LUCENE-9354) > > > > Does it make sense to also update this list in solr? I have an apache > jira > > account and so would be happy to raise the necessary issue and make a > > contribution for this update if it can help speed up the process. > > > > Regards, > > Alastair > > _______________________ > Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >