Shai, again the problem is not really performance (I am ignoring that for
now), but the fact that lowercasing and case folding are different.

An easy example, the lowercase of ß is ß itself, it is already lowercase.
it will not match with 'SS' if you use lowercase filter.

if you use case folding, these two will match.

On Mon, Nov 30, 2009 at 2:53 PM, Shai Erera <ser...@gmail.com> wrote:

> Robert, what if I need to do additional filtering after CollationKeyFilter,
> like stopwords removal, abbreviations handling, stemming etc? Will that be
> possible if I use CollationKeyFilter?
>
> I also noticed CKF creates a String out of the char[]. If the code already
> does that, why not use String.toLowerCase(Locale)?
>
> Shai
>
> On Mon, Nov 30, 2009 at 9:46 PM, Simon Willnauer <
> simon.willna...@googlemail.com> wrote:
>
> > On Mon, Nov 30, 2009 at 8:08 PM, Robert Muir <rcm...@gmail.com> wrote:
> > >> I am not sure if it is worth to add a new TokenFilter for Turkish
> > language.
> > >> I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It
> > would
> > >> be nice to see TurkishLowerCaseFilter in Lucene.
> > >>
> > >>
> > >>
> > > just to clarify, GreekLowerCaseFilter really shouldn't exist either.
> The
> > > final sigma problem it has (where there are two lowercase forms
> depending
> > > upon position in word), this is also solved with unicode case folding
> or
> > > collation. This is a perfect example of how lowercase is the wrong
> > operation
> > > for search.
> > >
> > > and RussianLowerCaseFilter is deprecated now, it does the exact same
> > thing
> > > as LowerCaseFilter.
> > btw. we should fix supplementary chars in there too even if it is
> > deprecated.
> >
> > >
> > > --
> > > Robert Muir
> > > rcm...@gmail.com
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
Robert Muir
rcm...@gmail.com

Reply via email to