Hello, there is already an issue of this. The basics are that lowercase with locale is still not even right. because, its intended for presentation (display), not for case folding.
the problem is case folding is not exposed in the JDK, and you have to use the alternate "turkish/azeri" mappings anyway an even better way to solve this issue, which solves more than just case distinction, is to simply use contrib/collation. in the javadocs, there is a specific example of how to solve this turkish issue this way. this is now available in solr trunk also, and there's a test for this specific issue there, too. On Mon, Nov 30, 2009 at 2:00 PM, AHMET ARSLAN <iori...@yahoo.com> wrote: > In Turkish alphabet lowercase of I is not i. It is LATIN SMALL LETTER > DOTLESS I. LowerCaseFilter which uses Character.toLowerCase() makes mistake > just for that character. > > http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()<http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase%28%29> > > I am not sure if it is worth to add a new TokenFilter for Turkish language. > I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It would > be nice to see TurkishLowerCaseFilter in Lucene. > > Wiki recommends to ask permission from lucene committers before opening an > issue. I can provide a patch (although it is just a one line change in > original LowercaseFilter) for that if you want. > > Thank you for your consideration. > > Ahmet > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com