Re: Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-27 Thread Xueming Shen
Thanks Alan! webrev has been updated accordingly. -Sherman On 4/27/2011 8:51 AM, Alan Bateman wrote: Xueming Shen wrote: : UNICODE_CHARACTER_CLASS is clear and straightforward. I am OK with it. The webrev, ccc and api docs have been updated accordingly. Yes, I still need a reviewer for th

Re: Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-27 Thread Alan Bateman
Xueming Shen wrote: : UNICODE_CHARACTER_CLASS is clear and straightforward. I am OK with it. The webrev, ccc and api docs have been updated accordingly. Yes, I still need a reviewer for the implementation changes. Tom has helped review the doc (and the definition of those properties). I've g

Re: Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-26 Thread Xueming Shen
odereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties Date: Sat, 23 Apr 2011 17:53:42 -0700 From: Xueming Shen <mailto:xueming.s...@oracle.com> To: Tom Christiansen <mailto:tchr...@perl

Re: Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-26 Thread Xueming Shen
On 04-26-2011 2:20 AM, Alan Bateman wrote: Xueming Shen wrote: Thanks Mark! Let's go with UNICODE_PROPERTY, if there is no objection. I went through the updates to the javadoc and the approach looks good and nicely done. A minor comment is that the compile(String,int) method repeats the list

Re: Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-26 Thread Alan Bateman
Xueming Shen wrote: Thanks Mark! Let's go with UNICODE_PROPERTY, if there is no objection. I went through the updates to the javadoc and the approach looks good and nicely done. A minor comment is that the compile(String,int) method repeats the list of flags that are allowed so that should be

Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-24 Thread Xueming Shen
Thanks Tom! The j.u.regex does not have its own direct access to PropList for now, have to use the properties from j..l.Character class. I will have to move those CharacterDateNN classes from the java.lang package (package private) to sun.lang or somewhere that both j.u.Character and j.u.regex

Re: Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-24 Thread Mark Davis ☕
On 4/23/2011 6:50 PM, Xueming Shen wrote: > > Forwarding...forgot to include the list. > > Original Message Subject: Re: Codereview Request: > 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 > Properties Date: Sat, 23 Apr 2011 17:53:42 -0

Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-24 Thread Tom Christiansen
Xueming, the docs look good. On the name of the flag, I have no strong feelings one way or the other. Perhaps between UNICODE_PROPERTIES and UNICODE_CLASSES, I would prefer the second one. The first makes me think of the regular properties like \p{Script=Greek} from RL1.2, not the compat proper

Re: Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-24 Thread Xueming Shen
Simple Word Boundaries and RL1.2 Properties Date: Sat, 23 Apr 2011 17:53:42 -0700 From: Xueming Shen To: Tom Christiansen Mark, Tom, I agree with Mark that UNICODE_SPEC is a better name than UNICODE_CHARSET. We will have to deal with the "compatibility" issue Tom mentio

Fwd: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-23 Thread Xueming Shen
Forwarding...forgot to include the list. Original Message Subject: Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties Date: Sat, 23 Apr 2011 17:53:42 -0700 From: Xueming Shen To: Tom Christiansen Mark

Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-23 Thread Tom Christiansen
Mark Davis ☕ wrote on Sat, 23 Apr 2011 09:09:55 PDT: > The changes sound good. They sure do, don't they? I'm quite happy about this. I think it is more important to get this in the queue than that it (necessarily) be done for JDK7. That said, having a good tr18 RL1 story for JDK7's Unico

Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-23 Thread Mark Davis ☕
The changes sound good. The flag UNICODE_CHARSET will be misleading, since all of Java uses the Unicode Charset (= encoding). How about: UNICODE_SPEC or something that gives that flavor. Mark *— Il meglio è l’inimico del bene —* On Sat, Apr 23, 2011 at 01:12, Xueming Shen wrote: > The flag

Re: Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-23 Thread Xueming Shen
The flag this request proposed to add is UNICODE_CHARSET not the "UNICODE_UNICODE" in last email. My apology for the typo. Any suggestion for a better name? It was UNICODE_CHARACTERCLASS, but then it became UNICODE_CHARSET, considering the unicode_case. -Sherman On 4/23/2011 1:00 AM, Xuemi

Codereview Request: 7039066 j.u.rgex does not match TR#18 RL1.4 Simple Word Boundaries and RL1.2 Properties

2011-04-23 Thread Xueming Shen
Hi This proposal tries to address (1) j.u.regex does not meet Unicode regex's Simple Word Boundaries [1] requirement as Tom pointed out in his email on i18n-dev list [2]. Basically we have 3 problems here. a. ju.regex word boundary construct \b and \B uses Unicode \p{letter} + \p{digit

Re: RL1.4 Simple Word Boundaries

2011-01-24 Thread Tom Christiansen
Sherman wrote: > Regarding RL1.4.(1), the U+200C and U+2000 are obviously a bug that > the Java regex failed to update the implementation to sync with the > tr#18 update, it appears these two don't "exists" in RL1.4/v9, > neither does RL1.2a, the compatibility properties. > The words for 1.4(1)

Re: RL1.4 Simple Word Boundaries (actually, RL1.2 & RL1.2a)

2011-01-24 Thread Tom Christiansen
Sherman wrote: > Thanks for the detailed and excellent "reality check". While I'm still > going through all the details it appears that the fact the current > Java Unicode property data does not include the properties defined in > PropList.txt (current implementation reads the property data only f

Re: RL1.4 Simple Word Boundaries

2011-01-23 Thread Xueming Shen
them in details, file corresponding bug/rfe into our database and then follow up from there. -Sherman On 1-23-2011 11:44 11:44 AM, Tom Christiansen wrote: Java does not meet this requirement. Specifically, it does not offer a mechanism for stipulation #1 cited below: RL1.4 Simpl

RL1.4 Simple Word Boundaries

2011-01-23 Thread Tom Christiansen
Java does not meet this requirement. Specifically, it does not offer a mechanism for stipulation #1 cited below: RL1.4 Simple Word Boundaries To meet this requirement, an implementation shall extend the word boundary mechanism so that: (1) The class of includes all the