Perhaps "least frequent substring" or even "suffix truncation" might be enough for your needs.
Here is a related paper: http://web.jhu.edu/bin/q/b/p75-mcnamee.pdf karl On Jun 8, 2011, at 1:52 PM, Mohamed Yahya wrote: > You're right. Still, I am not sure if there is a library that would > take care of examples such as the one I gave. > > On Wed, Jun 8, 2011 at 11:25, Lahiru Samarakoon <lahir...@gmail.com> wrote: >> Hi, >> >>> >>> Is there something in Lucene that supports lemmatization of the following >>> form: >>> >>> Mexican --> Mexico (from adjective to name/noune) >>> >>> Lemmatization do not change part of speech. I think you are looking for a >> stemming algorithm. >> >> http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html >> >>> >>> >>> Thanks, >> Lahiru >> > > > > -- > Mohamed Yahya > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org