Re: How to support stemming and case folding for english content mixed with non-english content?

Robert Muir Thu, 04 Jun 2009 05:53:08 -0700

uwe what KK needs here is 'proper unicode handling'.

since the latest WordDelimiterFilter has pretty good handling of unicode
categories, combining this with WhiteSpaceTokenizer effectively gives you a
pretty good solution for unicode tokenization.


KK doesn't need detection of anything, the porter stem filter will simply
leave the indic text alone... so it will just work.

On Thu, Jun 4, 2009 at 8:40 AM, Uwe Schindler <u...@thetaphi.de> wrote:

> > I request Uwe to give me some more ideas on using the analyzers from solr
> > that will do the job for me, handling a mix of both english and non-
> > english content.
>
> Look here:
>
> http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.h
> tml<http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.h%0Atml>
>
> As you see, the Solr analyzers are just standard Lucene analyzers. So you
> can drop the solr core jar into your project and just use them :-)
>
> Currently I am not sure which one is the analyzer Robert means, that can do
> english stemming and detecting non-english parts, but there is to look for
> it.
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

Re: How to support stemming and case folding for english content mixed with non-english content?

Reply via email to