Re: RFR: 8311220: Optimization for StringLatin UpperLower [v4]

温绍锦 Sun, 03 Sep 2023 10:38:25 -0700

On Sun, 3 Sep 2023 12:33:18 GMT, Claes Redestad <redes...@openjdk.org> wrote:


> The two odd codepoints I was curious about are `0xaa` and `0xba`, both of 
> which are lower-case according to `Character.isLowerCase(..)` but does not 
> actually have an uppercase. The Unicode data categorize these two as `Lo`, 
> Letter, other, so I'm a little confused how they got tagged as lowercase.
> 
> `Character.toUpperCaseEx` is specified as adhering to the definition of the 
> unicode data (unlike some legacy java character definition that might differ 
> subtly) so perhaps it's reasonable to specify this newly invented 
> `isLowerCaseEx` as strictly adhering to the unicode data in which case I 
> think `0xaa` and `0xbb` should not be considered lower case. I am not a 
> domain expert and would like someone like @naotoj to weigh in here. But 
> either way we should think about how to specify this kind of method to keep 
> it precise. Even if it's only internal code..
> 
> I suggested `hasUpperCase` (or maybe `hasUpperCaseEx`) as a way out of this 
> particular conundrum, since it makes perfect sense to define a method named 
> like that to be equivalent to `return cp != 
> CharacterDataLatin1.instance.toUpperCaseEx(cp);`

i have renamed isLowerCaseEx to hasNotUpperCaseEx, is this ok?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14751#issuecomment-1704360024

Re: RFR: 8311220: Optimization for StringLatin UpperLower [v4]

Reply via email to