Let me just make things a bit clear...
I think the concern here is that FrenchMinimalStemmer would remove the
last "digit" from a token because of it does not check if the
character is letter or not.
e.g., "123455" is trimmed to "12345" by FrenchMinimalStemmer.
To me, this behaviour is beyond stem
I'm not so sure. I think the whole idea of having both stemmers is that the
minimal one does less than the light one.
Removing the final character of a double letter suffix is going to
sacrifice some precision. For example mes/mess, ne/née, I'm sure there are
others.
So having both options is hel
I found an issue which adds the isLetter() check on FrenchLightStemmer.
https://issues.apache.org/jira/browse/LUCENE-4063
Seems the same change has not been applied to FrenchMinimalStemmer,
would it be a good idea that we add the same check to it to avoid too
aggressive stemming?
Tomoko
2019年7月2
Hi Adrien,
To me, it sounds simply a bug. Can you please open a JIRA (with a
patch if possible)?
Tomoko
2019年7月23日(火) 22:05 Adrien Gallou :
>
> Hi,
>
> I'm using both light and minimal French stemmers and encountered an issue
> when using the minimal stemmer.
>
> The light stemmer removes the la