On 01/20/2011 12:55 PM, Tom Christiansen wrote:
Sherman wrote:
At the end, Java RegEx is NOT a Unicode RegEx, while it
supports Unicode RegEx at certain level, sometime via different
syntax, I don't feel this is a big problem for most Java
developers and should not be a stopper for most program.
I do not understand what you mean when you say that Java regexes
aren't Unicode regexes. Are you referring to the various
syntactic features of UTS 18, Unicode Regular Expressions?
If so, it's my understanding that many of those are examples
only, especially when it comes to how something actually looks.
I fully agree with you that Java indeed offers some of the
functionality described there in other ways than given by those
particular examples, and that quite often this doesn't make
enough practical difference as to be a show-stopper. I discuss
this further later on down in this message.
Another possible interpretation of:
Java RegEx is NOT a Unicode RegEx, while it
supports Unicode RegEx at certain level,
is that you are saying that the standard Java regex
class does not provide the baseline Level 1 Unicode
support spelled out in UTS#18, then I'm afraid you
are again correct.
However, I would very much like to see this fixed. That's
because Level 1 support is the absolute mimimum level required
for useful Unicode support. To quote from UTS#18:
Hi Tom,
That is NOT what I'm saying.
The Java RegEx is supposed to be "in conformance with level 1 of UTS#18
plus RL2.1
Canonical Equivalents", so anything defined in UTS#18 level one should
be supported
by Java RegEx, though might not be the exact same syntax
defined/recommended by
UTS#18 or just work out of the box, for example the Unicode case
insensitive match,
you will have to specify a particular "flag" to turn it on, basically
for performance reason.
Really appreciate if you can provide the details of what is missing out
for the level one
support, given that would be a specification broken I definitely can put
it on high priority
list to work on. The script support is one of the level one request that
we don't have it in
our latest release, but I have added it in the up coming jdk7. I'm sure
there are bugs
and corner cases here and there even we have lots of tests supposedly to
cover everything:-)
Had been dedicatedly working on Java I18n for years, so I fully
understand how important
the Unicode is, especially for Java as the platform. And it's our goal
to have java provide
the most useful Unicode support, it would be the last thing for me to
say "go pick other
language/platform". No, I don't feel any offense at all. In fact we are
really appreciated
these useful comments, suggestions, expertise, which will definitely
help evolve the platform.
-Sherman