Re: LowerCaseFilter fails one letter (I) of Turkish alphabet

Shai Erera Mon, 30 Nov 2009 11:53:55 -0800

Robert, what if I need to do additional filtering after CollationKeyFilter,
like stopwords removal, abbreviations handling, stemming etc? Will that be
possible if I use CollationKeyFilter?


I also noticed CKF creates a String out of the char[]. If the code already
does that, why not use String.toLowerCase(Locale)?

Shai

On Mon, Nov 30, 2009 at 9:46 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:

> On Mon, Nov 30, 2009 at 8:08 PM, Robert Muir <rcm...@gmail.com> wrote:
> >> I am not sure if it is worth to add a new TokenFilter for Turkish
> language.
> >> I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It
> would
> >> be nice to see TurkishLowerCaseFilter in Lucene.
> >>
> >>
> >>
> > just to clarify, GreekLowerCaseFilter really shouldn't exist either. The
> > final sigma problem it has (where there are two lowercase forms depending
> > upon position in word), this is also solved with unicode case folding or
> > collation. This is a perfect example of how lowercase is the wrong
> operation
> > for search.
> >
> > and RussianLowerCaseFilter is deprecated now, it does the exact same
> thing
> > as LowerCaseFilter.
> btw. we should fix supplementary chars in there too even if it is
> deprecated.
>
> >
> > --
> > Robert Muir
> > rcm...@gmail.com
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: LowerCaseFilter fails one letter (I) of Turkish alphabet

Reply via email to