Sherman,
Since you're looking through my messages for potential RFEs,
I thought I would point a pair of low-hanging fruit for you.
tr18 contains two distinct strong recommendations, both of which should
be quite easy to convert into RFEs. As recommendations, even strong
ones, they are of course
Sherman, referring to Java's ASCII-only senses of \w and \s,
and of \p{alpha} and \p{space}, wrote:
> (does Perl 5 work in this way as well?)
No, not for a very, very long time. For most of Perl's life,
charclass escapes like \w have always been Unicode aware.
However, it did take us some time
When I set about resolving the Unicode troubles in Java regular
expressions through rewriting them into something Java understood,
I found it convenient to divide that functionality into two different
rewriting functions, one to handle string escapes like \u and the
other to handle charclas
Now I will discuss the more interesting of my two functions, the one that
handles charclass escapes such as those given in RL1.2a. The particular
Level 1 place where this code is relevant is RL1.2a's Annex C Compatibility
Properties, RL 1.4 Simple Word Boundaries, and RL1.6 Line Boundaries.
I do
That concludes my discoveries, analysis, and remediations related
to j.u.r.Pattern's conformance with tr18's Level 1 requirements.
I would be interested in guidance toward how I can best help you
now that all that is done.
Would you all like some time to absorb and digest this set of
writings fr
Tom,
The fact that these POSIX/ASCII only version properties/constructs have been
there for years ("compatibility") and it appears that "most" developers
are happy
(habit, performance...) with them, I don't think we can and want to
switch to the
Unicode version, simply for conformance. Java ta
> The fact that these POSIX/ASCII only version properties/constructs
> have been there for years ("compatibility") and it appears that "most"
> developers are happy (habit, performance...) with them, I don't think
> we can and want to switch to the Unicode version, simply for
> conformance.
I agr
Tom,
Yes, I would need some time to digest all the technical details, though
I believe I've
had a good understanding of most issues you raised.
Sure, I will keep you updated for the related RFEs I will submit based
on your research.
The CR# so far I have are
7014645: Support Perl style Uni
Hi all,
Just like to say that this is why I got involved in the combination of Java
and open source - it really does lift the spirits to see this sort of
discourse, even if a majority of the technical details fly over my head (all
of you are plain scary ;p)! I'm speaking at a number of conference
The goal of the clause is to have a mechanism for using hex values for
character literals. That is, you should be able to take a code point from 0
to 10, get a hex value for that, embed it in some syntax, and
concatenate it into a pattern, and have it work as a literal.
For example:
String pa
The Unicode Standard distinguishes between Unicode Strings (16-bit) and
UTF-16. In the former, which is often the form used in programming
languages, a singleton value of 0xD800..0xDFFF is allowed, and is treated as
if it were a reserved code point.
So you do get some funny cases, because
1. 0
There's been a long and interesting discussion on this list.
We are coming up to a quarterly Unicode Technical Committee meeting
(starting Feb 7), so there is the opportunity to make requests / proposals
about UTS18. In particular, if there are areas of the spec that are unclear
or features that p
Hi Mark,
I guess you are asking for something like?
char[] cc = Character.toChars(0x12345);
Matcher m = Pattern.compile("["
+ "\\u" + HEX(cc[0])
+ "\\u" + HEX(cc[1])
+ "
13 matches
Mail list logo