It makes sense to me to keep the two sets aligned!  Please do open a JIRA and a 
PR.
> On Jun 11, 2024, at 5:42 AM, Alastair Porter <alast...@porter.net.nz> wrote:
> 
> Hello,
> I see that the stopwords_fr.txt list included in the solr default configset
> is "out of date" compared to the same file in lucene.
> https://github.com/apache/solr/blob/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang/stopwords_fr.txt
> https://github.com/apache/lucene/blame/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
> 
> Specifically, I was running into issues where I am trying to index été
> (summer), and it was being removed due to it also being a conjugation of
> être ("to be").
> It appears that the snowball list (
> https://snowballstem.org/algorithms/french/stop.txt) has been updated to
> resolve this specific issue, and by looking at the commit history in the
> lucene repository this happened many years ago (
> https://issues.apache.org/jira/browse/LUCENE-9354)
> 
> Does it make sense to also update this list in solr? I have an apache jira
> account and so would be happy to raise the necessary issue and make a
> contribution for this update if it can help speed up the process.
> 
> Regards,
> Alastair

_______________________
Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to