We did not attempt to use a newer version of Xerces-j before, since we did
not want to break the Xalan-j distribution we are using. Not sure if it's
license even supports doing that and having two JARs of the same thing on
classpath didn't seem like a good idea.

Anyways, we can use RegularExpression in a way that suits our use case now.
I only wanted to point out that it started behaving in an undefined way
when an undefined option was used. I would have expected an unchecked
exception being thrown or some sort of other kind of a warning for such
cases even if this class is not supposed to be a part of public API.

Since (if I understood correctly) the example I posted works fine for you,
this issue might have been mended in the past, so this thread became
irrelevant as soon as you mentioned that.. :P

Thanks, Jernej


2013/8/14 Michael Glavassevich <mrgla...@ca.ibm.com>

> Hi Jernej,
>
> Jernej Tuljak <jernej.tul...@gmail.com> wrote on 08/14/2013 03:41:17 AM:
>
> > Hi,
> >
> > we're abusing org.apache.xerces.impl.xpath.regex.RegularExpression
>
> Yep. :-)
>
> > to validate XSD flavor regular expression strings and later matching
> > test strings against them. It seemingly worked, until someone tried
> > to use a very specific regex.
> >
> > Here's the code:
> >
> >     import org.apache.xerces.impl.xpath.regex.RegularExpression;
> >
> >     public class XercesRegexTest {
> >
> >         public static void main(String[] args) {
> >             String regexString = "([a-zA-Z][^ ]*)";
> >             RegularExpression regex = new RegularExpression(regexString,
> "x");
> >             System.out.println(regex.toString());
> >         }
> >
> >     }
> >
> > The `x` option is supposed to make the regex engine conform to XSD
> > regular expressions.
>
> Only 'X' does that. That is the only option which Xerces uses internally.
>
> > But if you run this code, you'll end up with
> >
> >     Exception in thread "main"
> > org.apache.xerces.impl.xpath.regex.ParseException: Unexpected end of
> > the pattern in a character class.
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.ex(Unknown
> Source)
> >         at
> > org.apache.xerces.impl.xpath.regex.RegexParser.parseCharacterClass
> > (Unknown Source)
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom
> > (Unknown Source)
> >         at
> > org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown
> Source)
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm
> > (Unknown Source)
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex
> > (Unknown Source)
> >         at
> > org.apache.xerces.impl.xpath.regex.RegexParser.processParen(Unknown
> Source)
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom
> > (Unknown Source)
> >         at
> > org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown
> Source)
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm
> > (Unknown Source)
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex
> > (Unknown Source)
> >         at org.apache.xerces.impl.xpath.regex.RegexParser.parse
> > (Unknown Source)
> >         at
> > org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern
> > (Unknown Source)
> >         at
> > org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern
> > (Unknown Source)
> >         at
> > org.apache.xerces.impl.xpath.regex.RegularExpression.<init>(Unknown
> Source)
> >         at com.mgsoft.testing.regex.XercesRegexTest.main
> > (XercesRegexTest.java:9)
> >     Java Result: 1
> >
> > It first looked like a bug in Xerces' regular expression parser, but
> > after re-reading the documentation (http://xerces.apache.org/xerces-
> > j/apiDocs/org/apache/xerces/utils/regex/RegularExpression.html) of
> > this class, I found out that the `x` option should actually be `X`
> > (upper case).
>
> The docs for that class probably haven't changed much over the years but
> worth pointing out that that's the Xerces-J 1.x documentation not Xerces-J
> 2.x.
>
> > Thing is...it worked for countless other regular
> > expressions. In fact it is that space that is causing problems, any
> > other char works fine. Also removing the option and using the single
> > string constructor of `RegularExpression` works fine.
>
> If you're not specifying 'X' then you're using a mode that isn't XSD and
> that we never use.
>
> > Does anyone know why this is happening? I realize that this class is
> > probably not intended for such usage, but since the spec we're
> > implementing uses XSD regular expressions, we tried to avoid
> > reinventing the wheel though re-usage.
>
> Works for me with the current code in SVN.
>
> > We are using xercesImpl.jar that is distributed with xalan-j 2.7.1.
>
> Whatever you got out of Xalan-J 2.7.1 would be very old now. Have you
> tried Xerces-J 2.11.0?
>
> Thanks.
>
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrgla...@ca.ibm.com
> E-mail: mrgla...@apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org
>
>

Reply via email to