Hi Jernej,
Jernej Tuljak <[email protected]> wrote on 08/14/2013 03:41:17 AM:
> Hi,
>
> we're abusing org.apache.xerces.impl.xpath.regex.RegularExpression
Yep. :-)
> to validate XSD flavor regular expression strings and later matching
> test strings against them. It seemingly worked, until someone tried
> to use a very specific regex.
>
> Here's the code:
>
> import org.apache.xerces.impl.xpath.regex.RegularExpression;
>
> public class XercesRegexTest {
>
> public static void main(String[] args) {
> String regexString = "([a-zA-Z][^ ]*)";
> RegularExpression regex = new RegularExpression(regexString,
"x");
> System.out.println(regex.toString());
> }
>
> }
>
> The `x` option is supposed to make the regex engine conform to XSD
> regular expressions.
Only 'X' does that. That is the only option which Xerces uses internally.
> But if you run this code, you'll end up with
>
> Exception in thread "main"
> org.apache.xerces.impl.xpath.regex.ParseException: Unexpected end of
> the pattern in a character class.
> at org.apache.xerces.impl.xpath.regex.RegexParser.ex(Unknown
Source)
> at
> org.apache.xerces.impl.xpath.regex.RegexParser.parseCharacterClass
> (Unknown Source)
> at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom
> (Unknown Source)
> at
> org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown
Source)
> at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm
> (Unknown Source)
> at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex
> (Unknown Source)
> at
> org.apache.xerces.impl.xpath.regex.RegexParser.processParen(Unknown
Source)
> at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom
> (Unknown Source)
> at
> org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown
Source)
> at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm
> (Unknown Source)
> at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex
> (Unknown Source)
> at org.apache.xerces.impl.xpath.regex.RegexParser.parse
> (Unknown Source)
> at
> org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern
> (Unknown Source)
> at
> org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern
> (Unknown Source)
> at
> org.apache.xerces.impl.xpath.regex.RegularExpression.<init>(Unknown
Source)
> at com.mgsoft.testing.regex.XercesRegexTest.main
> (XercesRegexTest.java:9)
> Java Result: 1
>
> It first looked like a bug in Xerces' regular expression parser, but
> after re-reading the documentation (http://xerces.apache.org/xerces-
> j/apiDocs/org/apache/xerces/utils/regex/RegularExpression.html) of
> this class, I found out that the `x` option should actually be `X`
> (upper case).
The docs for that class probably haven't changed much over the years but
worth pointing out that that's the Xerces-J 1.x documentation not Xerces-J
2.x.
> Thing is...it worked for countless other regular
> expressions. In fact it is that space that is causing problems, any
> other char works fine. Also removing the option and using the single
> string constructor of `RegularExpression` works fine.
If you're not specifying 'X' then you're using a mode that isn't XSD and
that we never use.
> Does anyone know why this is happening? I realize that this class is
> probably not intended for such usage, but since the spec we're
> implementing uses XSD regular expressions, we tried to avoid
> reinventing the wheel though re-usage.
Works for me with the current code in SVN.
> We are using xercesImpl.jar that is distributed with xalan-j 2.7.1.
Whatever you got out of Xalan-J 2.7.1 would be very old now. Have you
tried Xerces-J 2.11.0?
Thanks.
Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: [email protected]
E-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]