Thanks, Eric. I created https://issues.apache.org/jira/browse/SOLR-17346
and opened a PR against it.

Alastair

On Tue, 11 Jun 2024 at 14:02, Eric Pugh <ep...@opensourceconnections.com>
wrote:

> It makes sense to me to keep the two sets aligned!  Please do open a JIRA
> and a PR.
>
> > On Jun 11, 2024, at 5:42 AM, Alastair Porter <alast...@porter.net.nz>
> wrote:
> >
> > Hello,
> > I see that the stopwords_fr.txt list included in the solr default
> configset
> > is "out of date" compared to the same file in lucene.
> >
> https://github.com/apache/solr/blob/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
> >
> https://github.com/apache/lucene/blame/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
> >
> > Specifically, I was running into issues where I am trying to index été
> > (summer), and it was being removed due to it also being a conjugation of
> > être ("to be").
> > It appears that the snowball list (
> > https://snowballstem.org/algorithms/french/stop.txt) has been updated to
> > resolve this specific issue, and by looking at the commit history in the
> > lucene repository this happened many years ago (
> > https://issues.apache.org/jira/browse/LUCENE-9354)
> >
> > Does it make sense to also update this list in solr? I have an apache
> jira
> > account and so would be happy to raise the necessary issue and make a
> > contribution for this update if it can help speed up the process.
> >
> > Regards,
> > Alastair
>
> _______________________
> Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Reply via email to