Re: LowerCaseFilter fails one letter (I) of Turkish alphabet

Robert Muir Mon, 30 Nov 2009 11:05:42 -0800

Hello, there is already an issue of this.

The basics are that lowercase with locale is still not even right. because,
its intended for presentation (display), not for case folding.


the problem is case folding is not exposed in the JDK, and you have to use
the alternate "turkish/azeri" mappings anyway

an even better way to solve this issue, which solves more than just case
distinction, is to simply use contrib/collation.
in the javadocs, there is a specific example of how to solve this turkish
issue this way.
this is now available in solr trunk also, and there's a test for this
specific issue there, too.

On Mon, Nov 30, 2009 at 2:00 PM, AHMET ARSLAN <iori...@yahoo.com> wrote:

> In Turkish alphabet lowercase of I is not i. It is LATIN SMALL LETTER
> DOTLESS I. LowerCaseFilter which uses Character.toLowerCase() makes mistake
> just for that character.
>
> http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()<http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase%28%29>
>
> I am not sure if it is worth to add a new TokenFilter for Turkish language.
> I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It would
> be nice to see TurkishLowerCaseFilter in Lucene.
>
> Wiki recommends to ask permission from lucene committers before opening an
> issue. I can provide a patch (although it is just a one line change in
> original LowercaseFilter) for that if you want.
>
> Thank you for your consideration.
>
> Ahmet
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

Re: LowerCaseFilter fails one letter (I) of Turkish alphabet

Reply via email to