Sherman wrote: > As of the Unicode support in j.l.Character class,
>> What I most dearly love to see Java would be brought fully up to date >> so that its basic Character class supports whatever the current Unicode >> release happens to be. Wouldn't that be great? > Java language specification clearly specifies in [2] that Java platform > tracks Unicode specification as it evolves. The up coming JDK7 will base > its character data on Unicode 6.0. So Java platform IS fully up to date to > the Unicode Standard, as its specification requires, but it does not > necessarily mean it has to support "whatever" the Unicode offers, added in > new releases. Yours there is one those things that I find tricky to understand. Being "fully up to date with the Unicode standard" appears to mean different things to the two of us. What quite specifically does it mean to you? I've just spent most of the day reading up on conformance issues. There is quite a bit in the Unicode 6.0 conformance chapter: http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf One thing I noticed there is that the first thing in section ยง3.2 Conformance Requirements is something I brought up in part 2: C1 A process shall not interpret a high-surrogate code point or a low-surrogate code point as an abstract character. That *seems* to agree with me that it is incorrect for the regex engine to allow either a high or a low surrogate in isolation to match as though it were an abstract character. If you recall, my example was that I did not feel that Java should allow "\uD83D" to match "^.$". However, there may be *some* wiggle-room here. At least, it is not completely obvious to me. See the long discussion under C10 about what to do with code unit sequences that are ill-formed for a particular encoding form. In any event, we're getting to the important part now: what the Unicode Standard really means. There is certainly a heck of a lot of stuff in Unicode, and just because a platform doesn't implement every little bit of it does *not* mean that that platform is somehow non-conformant. Perhaps that's what you were saying, Sherman. The next message will be mostly about tr18's RL1.2 Properties, which is where I feel the most serious problems exist. There are two main problems: #1: Several key properties required for a comforming implementation are missing. #2: You use certain Unicode-defined property names but assign them meanings different than what Unicode says you must give them. --tom