Shai, again the problem is not really performance (I am ignoring that for now), but the fact that lowercasing and case folding are different.
An easy example, the lowercase of ß is ß itself, it is already lowercase. it will not match with 'SS' if you use lowercase filter. if you use case folding, these two will match. On Mon, Nov 30, 2009 at 2:53 PM, Shai Erera <ser...@gmail.com> wrote: > Robert, what if I need to do additional filtering after CollationKeyFilter, > like stopwords removal, abbreviations handling, stemming etc? Will that be > possible if I use CollationKeyFilter? > > I also noticed CKF creates a String out of the char[]. If the code already > does that, why not use String.toLowerCase(Locale)? > > Shai > > On Mon, Nov 30, 2009 at 9:46 PM, Simon Willnauer < > simon.willna...@googlemail.com> wrote: > > > On Mon, Nov 30, 2009 at 8:08 PM, Robert Muir <rcm...@gmail.com> wrote: > > >> I am not sure if it is worth to add a new TokenFilter for Turkish > > language. > > >> I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It > > would > > >> be nice to see TurkishLowerCaseFilter in Lucene. > > >> > > >> > > >> > > > just to clarify, GreekLowerCaseFilter really shouldn't exist either. > The > > > final sigma problem it has (where there are two lowercase forms > depending > > > upon position in word), this is also solved with unicode case folding > or > > > collation. This is a perfect example of how lowercase is the wrong > > operation > > > for search. > > > > > > and RussianLowerCaseFilter is deprecated now, it does the exact same > > thing > > > as LowerCaseFilter. > > btw. we should fix supplementary chars in there too even if it is > > deprecated. > > > > > > > > -- > > > Robert Muir > > > rcm...@gmail.com > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Robert Muir rcm...@gmail.com