On 01/20/2011 12:55 PM, Tom Christiansen wrote:
Sherman wrote:

At the end, Java RegEx is NOT a Unicode RegEx, while it
supports Unicode RegEx at certain level, sometime via different
syntax, I don't feel this is a big problem for most Java
developers and should not be a stopper for most program.
I do not understand what you mean when you say that Java regexes
aren't Unicode regexes.  Are you referring to the various
syntactic features of  UTS 18, Unicode Regular Expressions?
If so, it's my understanding that many of those are examples
only, especially when it comes to how something actually looks.

I fully agree with you that Java indeed offers some of the
functionality described there in other ways than given by those
particular examples, and that quite often this doesn't make
enough practical difference as to be a show-stopper.  I discuss
this further later on down in this message.

Another possible interpretation of:

Java RegEx is NOT a Unicode RegEx, while it
supports Unicode RegEx at certain level,
is that you are saying that the standard Java regex
class does not provide the baseline Level 1 Unicode
support spelled out in UTS#18, then I'm afraid you
are again correct.

However, I would very much like to see this fixed.  That's
because Level 1 support is the absolute mimimum level required
for useful Unicode support. To quote from UTS#18:

Hi Tom,

That is NOT what I'm saying.

The Java RegEx is supposed to be "in conformance with level 1 of UTS#18 plus RL2.1 Canonical Equivalents", so anything defined in UTS#18 level one should be supported by Java RegEx, though might not be the exact same syntax defined/recommended by UTS#18 or just work out of the box, for example the Unicode case insensitive match, you will have to specify a particular "flag" to turn it on, basically for performance reason.

Really appreciate if you can provide the details of what is missing out for the level one support, given that would be a specification broken I definitely can put it on high priority list to work on. The script support is one of the level one request that we don't have it in our latest release, but I have added it in the up coming jdk7. I'm sure there are bugs and corner cases here and there even we have lots of tests supposedly to cover everything:-)

Had been dedicatedly working on Java I18n for years, so I fully understand how important the Unicode is, especially for Java as the platform. And it's our goal to have java provide the most useful Unicode support, it would be the last thing for me to say "go pick other language/platform". No, I don't feel any offense at all. In fact we are really appreciated these useful comments, suggestions, expertise, which will definitely help evolve the platform.

-Sherman


Reply via email to