Hi, we're abusing org.apache.xerces.impl.xpath.regex.RegularExpression to validate XSD flavor regular expression strings and later matching test strings against them. It seemingly worked, until someone tried to use a very specific regex.
Here's the code: import org.apache.xerces.impl.xpath.regex.RegularExpression; public class XercesRegexTest { public static void main(String[] args) { String regexString = "([a-zA-Z][^ ]*)"; RegularExpression regex = new RegularExpression(regexString, "x"); System.out.println(regex.toString()); } } The `x` option is supposed to make the regex engine conform to XSD regular expressions. But if you run this code, you'll end up with Exception in thread "main" org.apache.xerces.impl.xpath.regex.ParseException: Unexpected end of the pattern in a character class. at org.apache.xerces.impl.xpath.regex.RegexParser.ex(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseCharacterClass(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.processParen(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegexParser.parse(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern(Unknown Source) at org.apache.xerces.impl.xpath.regex.RegularExpression.<init>(Unknown Source) at com.mgsoft.testing.regex.XercesRegexTest.main(XercesRegexTest.java:9) Java Result: 1 It first looked like a bug in Xerces' regular expression parser, but after re-reading the documentation ( http://xerces.apache.org/xerces-j/apiDocs/org/apache/xerces/utils/regex/RegularExpression.html) of this class, I found out that the `x` option should actually be `X` (upper case). Thing is...it worked for countless other regular expressions. In fact it is that space that is causing problems, any other char works fine. Also removing the option and using the single string constructor of `RegularExpression` works fine. Does anyone know why this is happening? I realize that this class is probably not intended for such usage, but since the spec we're implementing uses XSD regular expressions, we tried to avoid reinventing the wheel though re-usage. We are using xercesImpl.jar that is distributed with xalan-j 2.7.1.