yes, this is what I would do! The downside to using collation in your filter chain right now, is that then your terms in the index will not be human-readable. The upside is they will both sort and search the way your users expect for a huge list of languages.
On Mon, Nov 30, 2009 at 2:22 PM, AHMET ARSLAN <iori...@yahoo.com> wrote: > > just to clarify, GreekLowerCaseFilter really shouldn't > > exist either. The > > final sigma problem it has (where there are two lowercase > > forms depending > > upon position in word), this is also solved with unicode > > case folding or > > collation. This is a perfect example of how lowercase is > > the wrong operation > > for search. > > > > and RussianLowerCaseFilter is deprecated now, it does the > > exact same thing > > as LowerCaseFilter. > > Thank you for your explanations. I just read the java-doc of > org.apache.lucene.collation. If I am not wrong it is better to remove > lowercasefilter completely from analyzer chain and add CollationKeyFilter > with appropriate Locale right after the Tokenizer. Just as in > CollationKeyAnalyzer. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com