We did not attempt to use a newer version of Xerces-j before, since we did not want to break the Xalan-j distribution we are using. Not sure if it's license even supports doing that and having two JARs of the same thing on classpath didn't seem like a good idea.
Anyways, we can use RegularExpression in a way that suits our use case now. I only wanted to point out that it started behaving in an undefined way when an undefined option was used. I would have expected an unchecked exception being thrown or some sort of other kind of a warning for such cases even if this class is not supposed to be a part of public API. Since (if I understood correctly) the example I posted works fine for you, this issue might have been mended in the past, so this thread became irrelevant as soon as you mentioned that.. :P Thanks, Jernej 2013/8/14 Michael Glavassevich <mrgla...@ca.ibm.com> > Hi Jernej, > > Jernej Tuljak <jernej.tul...@gmail.com> wrote on 08/14/2013 03:41:17 AM: > > > Hi, > > > > we're abusing org.apache.xerces.impl.xpath.regex.RegularExpression > > Yep. :-) > > > to validate XSD flavor regular expression strings and later matching > > test strings against them. It seemingly worked, until someone tried > > to use a very specific regex. > > > > Here's the code: > > > > import org.apache.xerces.impl.xpath.regex.RegularExpression; > > > > public class XercesRegexTest { > > > > public static void main(String[] args) { > > String regexString = "([a-zA-Z][^ ]*)"; > > RegularExpression regex = new RegularExpression(regexString, > "x"); > > System.out.println(regex.toString()); > > } > > > > } > > > > The `x` option is supposed to make the regex engine conform to XSD > > regular expressions. > > Only 'X' does that. That is the only option which Xerces uses internally. > > > But if you run this code, you'll end up with > > > > Exception in thread "main" > > org.apache.xerces.impl.xpath.regex.ParseException: Unexpected end of > > the pattern in a character class. > > at org.apache.xerces.impl.xpath.regex.RegexParser.ex(Unknown > Source) > > at > > org.apache.xerces.impl.xpath.regex.RegexParser.parseCharacterClass > > (Unknown Source) > > at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom > > (Unknown Source) > > at > > org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown > Source) > > at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm > > (Unknown Source) > > at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex > > (Unknown Source) > > at > > org.apache.xerces.impl.xpath.regex.RegexParser.processParen(Unknown > Source) > > at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom > > (Unknown Source) > > at > > org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown > Source) > > at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm > > (Unknown Source) > > at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex > > (Unknown Source) > > at org.apache.xerces.impl.xpath.regex.RegexParser.parse > > (Unknown Source) > > at > > org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern > > (Unknown Source) > > at > > org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern > > (Unknown Source) > > at > > org.apache.xerces.impl.xpath.regex.RegularExpression.<init>(Unknown > Source) > > at com.mgsoft.testing.regex.XercesRegexTest.main > > (XercesRegexTest.java:9) > > Java Result: 1 > > > > It first looked like a bug in Xerces' regular expression parser, but > > after re-reading the documentation (http://xerces.apache.org/xerces- > > j/apiDocs/org/apache/xerces/utils/regex/RegularExpression.html) of > > this class, I found out that the `x` option should actually be `X` > > (upper case). > > The docs for that class probably haven't changed much over the years but > worth pointing out that that's the Xerces-J 1.x documentation not Xerces-J > 2.x. > > > Thing is...it worked for countless other regular > > expressions. In fact it is that space that is causing problems, any > > other char works fine. Also removing the option and using the single > > string constructor of `RegularExpression` works fine. > > If you're not specifying 'X' then you're using a mode that isn't XSD and > that we never use. > > > Does anyone know why this is happening? I realize that this class is > > probably not intended for such usage, but since the spec we're > > implementing uses XSD regular expressions, we tried to avoid > > reinventing the wheel though re-usage. > > Works for me with the current code in SVN. > > > We are using xercesImpl.jar that is distributed with xalan-j 2.7.1. > > Whatever you got out of Xalan-J 2.7.1 would be very old now. Have you > tried Xerces-J 2.11.0? > > Thanks. > > Michael Glavassevich > XML Technologies and WAS Development > IBM Toronto Lab > E-mail: mrgla...@ca.ibm.com > E-mail: mrgla...@apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org > For additional commands, e-mail: j-users-h...@xerces.apache.org > >