Re: LowerCaseFilter fails one letter (I) of Turkish alphabet

Robert Muir Mon, 30 Nov 2009 11:33:37 -0800

yes, this is what I would do! The downside to using collation in your filter
chain right now, is that then your terms in the index will not be
human-readable. The upside is they will both sort and search the way your
users expect for a huge list of languages.


On Mon, Nov 30, 2009 at 2:22 PM, AHMET ARSLAN <iori...@yahoo.com> wrote:

> > just to clarify, GreekLowerCaseFilter really shouldn't
> > exist either. The
> > final sigma problem it has (where there are two lowercase
> > forms depending
> > upon position in word), this is also solved with unicode
> > case folding or
> > collation. This is a perfect example of how lowercase is
> > the wrong operation
> > for search.
> >
> > and RussianLowerCaseFilter is deprecated now, it does the
> > exact same thing
> > as LowerCaseFilter.
>
> Thank you for your explanations. I just read the java-doc of
> org.apache.lucene.collation. If I am not wrong it is better to remove
> lowercasefilter completely from analyzer chain and add CollationKeyFilter
> with appropriate Locale right after the Tokenizer. Just as in
> CollationKeyAnalyzer.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

Re: LowerCaseFilter fails one letter (I) of Turkish alphabet

Reply via email to