Re: Proposed update to UTS#18

2011-04-15 Thread Tom Christiansen
I hope you all know there is a lot of handwaving at the end of my last posting. :) That's because it isn't actually implementable as things stand. There's no current way to track what was a single grapheme before the regex gets its hands on it if that regex engine is doing some sort of decompositi

Re: Proposed update to UTS#18

2011-04-15 Thread Andy Heninger
On Fri, Apr 15, 2011 at 8:01 AM, Mark Davis ☕ wrote: > The biggest issue is that for any transformation that changes the number of > characters, or rearranges them is problematic, for the reasons outlined in > the PRI. > > An example might be /(a|b|c*(?=...)|...)(d|...|a)/, which for Danish (unde

Re: Proposed update to UTS#18

2011-04-15 Thread Mark Davis ☕
The biggest issue is that for any transformation that changes the number of characters, or rearranges them is problematic, for the reasons outlined in the PRI. An example might be /(a|b|c*(?=...)|...)(d|...|a)/, which for Danish (under a collation tranform, stength 2) should match any of {aa, aA,.

Re: java.lang.Character lacuna #1 of 2

2011-04-15 Thread Xueming Shen
Tom I have filed CR/RFE 7036910: j.l.Character.toLowerCaseCharArray/toTitleCaseCharArray for this request. The j.l.Character.toLowerCase/toUpperCase() suggests to use String.toLower/UpperCase() for case mapping, if you want 1:M mapping taken care. And if you trust the API:-), which you shou