Re: RL1.4 Simple Word Boundaries (actually, RL1.2 & RL1.2a)

2011-01-24 Thread Tom Christiansen
Sherman wrote: > Thanks for the detailed and excellent "reality check". While I'm still > going through all the details it appears that the fact the current > Java Unicode property data does not include the properties defined in > PropList.txt (current implementation reads the property data only f

Re: RL1.4 Simple Word Boundaries

2011-01-24 Thread Tom Christiansen
Sherman wrote: > Regarding RL1.4.(1), the U+200C and U+2000 are obviously a bug that > the Java regex failed to update the implementation to sync with the > tr#18 update, it appears these two don't "exists" in RL1.4/v9, > neither does RL1.2a, the compatibility properties. > The words for 1.4(1)

Re: RL1.1 Hex Notation

2011-01-24 Thread Tom Christiansen
Sherman wrote: > Introducing in the new perl style \x{...} as the hexadecimal notation > appears to be a nice-to-have enhancement (I will file a RFE to put this > request in record). But I don't think you can simply deny that the Java > Unicode escape sequences for UTF16 is NOT A "mechanism"/notat

Re: RL1.1 Hex Notation

2011-01-24 Thread Xueming Shen
Tom, I would not overread this too much:-) There is no reason for the tr#18 to use any specific encoding in the specification, it's a perfect choice to simply pick the syntax notation that uses the code point value directly. However I don't think this "sample" syntax (or might be even further