RE: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Uwe Schindler
> What is the meaning of "the Unicode Policeman" ? Robert Muir :-) Uwe > Thanks, > Ahmet > > On Thursday, October 22, 2015 2:59 PM, Uwe Schindler > wrote: > > > > Hi, > > > > >> Setting aside the fact that Character.toLowerCase is already > > >> dubious in some locales (e.g. Turkish), > >

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Ahmet Arslan
Hi Uwe, What is the meaning of "the Unicode Policeman" ? Thanks, Ahmet On Thursday, October 22, 2015 2:59 PM, Uwe Schindler wrote: Hi, > >> Setting aside the fact that Character.toLowerCase is already dubious > >> in some locales (e.g. Turkish), > > > > This is not true. Character.toLower

RE: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Uwe Schindler
Hi, > >> Setting aside the fact that Character.toLowerCase is already dubious > >> in some locales (e.g. Turkish), > > > > This is not true. Character.toLowerCase() works locale-independent. > > It is only String.toLowerCase that works using default locale. So you mean the opposite. You wanted t

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Dawid Weiss
> LowerCaseFilter will not handle that. So whereas it is "safe" for > English hard-coded strings, it isn't safe for all fields you might > index in general. This filter is a "safe" fallback that works identically regardless of the locale you have on your computer (or on the server). This, I believ

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Trejkaz
On Thu, Oct 22, 2015 at 7:05 PM, Uwe Schindler wrote: > Hi, > >> Setting aside the fact that Character.toLowerCase is already dubious in some >> locales (e.g. Turkish), > > This is not true. Character.toLowerCase() works locale-independent. > It is only String.toLowerCase that works using default

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Dawid Weiss
Well, practice says there are no such cases... for (int cp = Character.MIN_CODE_POINT; cp < Character.MAX_CODE_POINT; cp++) { int c1 = Character.charCount(cp); int c2 = Character.charCount(Character.toUpperCase(cp)); int c3 = Character.charCount(Characte

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Dawid Weiss
I think the issue here is what happens if an "uppercase" codepoint requires a surrogate pair and the lowercase counterpart does not -- then the index variable would indeed be screwed. Dawid On Thu, Oct 22, 2015 at 10:05 AM, Uwe Schindler wrote: > Hi, > > > Setting aside the fact that Character.

RE: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Uwe Schindler
Hi, > Setting aside the fact that Character.toLowerCase is already dubious in some > locales (e.g. Turkish), This is not true. Character.toLowerCase() works locale-independent. It is only String.toLowerCase that works using default locale. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213