Perhaps "least frequent substring" or even "suffix truncation" might be enough 
for your needs.

Here is a related paper: http://web.jhu.edu/bin/q/b/p75-mcnamee.pdf


        karl



On Jun 8, 2011, at 1:52 PM, Mohamed Yahya wrote:

> You're right. Still, I am not sure if there is a library that would
> take care of examples such as the one I gave.
> 
> On Wed, Jun 8, 2011 at 11:25, Lahiru Samarakoon <lahir...@gmail.com> wrote:
>> Hi,
>> 
>>> 
>>> Is there something in Lucene that supports lemmatization of the following
>>> form:
>>> 
>>> Mexican --> Mexico (from adjective to name/noune)
>>> 
>>> Lemmatization do not change part of speech. I think you are looking for a
>> stemming algorithm.
>> 
>> http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
>> 
>>> 
>>> 
>>> Thanks,
>> Lahiru
>> 
> 
> 
> 
> -- 
> Mohamed  Yahya
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to