Re: Question about the light and minimal French stemmers

2019-07-27 Thread Tomoko Uchida
Let me just make things a bit clear... I think the concern here is that FrenchMinimalStemmer would remove the last "digit" from a token because of it does not check if the character is letter or not. e.g., "123455" is trimmed to "12345" by FrenchMinimalStemmer. To me, this behaviour is beyond stem

Re: Question about the light and minimal French stemmers

2019-07-27 Thread Michael Sokolov
I'm not so sure. I think the whole idea of having both stemmers is that the minimal one does less than the light one. Removing the final character of a double letter suffix is going to sacrifice some precision. For example mes/mess, ne/née, I'm sure there are others. So having both options is hel

Re: Question about the light and minimal French stemmers

2019-07-27 Thread Tomoko Uchida
I found an issue which adds the isLetter() check on FrenchLightStemmer. https://issues.apache.org/jira/browse/LUCENE-4063 Seems the same change has not been applied to FrenchMinimalStemmer, would it be a good idea that we add the same check to it to avoid too aggressive stemming? Tomoko 2019年7月2

Re: Question about the light and minimal French stemmers

2019-07-27 Thread Tomoko Uchida
Hi Adrien, To me, it sounds simply a bug. Can you please open a JIRA (with a patch if possible)? Tomoko 2019年7月23日(火) 22:05 Adrien Gallou : > > Hi, > > I'm using both light and minimal French stemmers and encountered an issue > when using the minimal stemmer. > > The light stemmer removes the la