RE: Doctype declarations in fragments

Gary Gregory Wed, 16 Sep 2009 11:39:40 -0700

For the curious:

I has been, almost to the day, *two* years since the release of 2.9.1, so the 
"if one of us modifies or removes..." is a big if :) as far as changes showing 
up in a new release. Unless the longer it has been, the more likely a new 
release is...


Are we to expect a release? Since 2.9.1 (9/14/2007):

19 issues have been marked "Resolved", of those, 13 are marked "Fixed":

https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?pid=10520&resolution=1&customfield_12310221%3Aafter=14%2FSep%2F07&sorter/field=issuekey&sorter/order=DESC&tempMax=1000

There are 206 issues reported "Unscheduled".

Luckily for us, we have not been bitten by any bugs, which is a testament to 
Xerces. OTOH, it's been a real pain not having XPath 2.0 and XSLT 2.0 in Xalan.

Gary Gregory
Senior Software Engineer
Seagull Software
ggreg...@seagullsoftware.com
www.seagullsoftware.com
________________________________
From: Michael Glavassevich [mrgla...@ca.ibm.com]
Sent: Wednesday, September 16, 2009 10:27 AM
To: j-users@xerces.apache.org
Subject: Re: Doctype declarations in fragments


Hi Wulf,

Haven't looked at the specifics of what you did but I'm never fond of any 
solution that involves extending or hooking into Xerces' internals. I'm not 
referring to XNI itself (which is a stable Xerces API) but rather the internal 
implementation classes you've chosen to use / extend. Your code could break at 
any time in the future if one of us modifies or removes any of those classes / 
methods.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

Wulf Berschin <bersc...@dosco.de> wrote on 09/16/2009 12:16:11 PM:

> Hi Michael,
>
> thank you for your response. I'll try your approach tomorrow...
>
> Meanwhile I was playing a little bit with XNI and the Parser
> Configuration and finally got Xerces to ignore this, indeed misplaced,
> construct. What I did is:
>
> ---
>
> public class FragmentDoctypeSkipParser
>      extends AbstractSAXParser {
>
>      public FragmentDoctypeSkipParser() {
>          super(new FragmentDoctypeSkipConfiguration());
>      }
> }
>
> ---
>
> public class FragmentDoctypeSkipConfiguration extends
>      StandardParserConfiguration
> {
>    protected XMLDocumentScanner createDocumentScanner()
>    {
>      return new FragmentDoctypeSkipScannerImpl();
>    }
> }
>
> ---
>
> public class FragmentDoctypeSkipScannerImpl extends
>      org.apache.xerces.impl.XMLDocumentScannerImpl
> {
>
>    /** Creates a content dispatcher. */
>    protected Dispatcher createContentDispatcher()
>    {
>      return new FDSContentDispatcher();
>    }
>
>    protected class FDSContentDispatcher extends ContentDispatcher
>    {
>      protected boolean scanForDoctypeHook() throws IOException, XNIException
>      {
>        XMLString xString = new XMLString();
>        if (fEntityScanner.skipString("DOCTYPE")) {
>
>
>          // spaces
>          if (!fEntityScanner.skipSpaces()) {
>            reportFatalError(
>                "MSG_SPACE_REQUIRED_BEFORE_ROOT_ELEMENT_TYPE_IN_DOCTYPEDECL",
>                null);
>          }
>          // root element name
>          String doctypeName = fEntityScanner.scanName();
>
>          String publicId = null;
>
>          // external id
>          if (fEntityScanner.skipSpaces()) {
>            // scanExternalID(dtIds, false);
>
>            if (fEntityScanner.skipString("PUBLIC")) {
>              if (!fEntityScanner.skipSpaces()) {
>                reportFatalError("SpaceRequiredAfterPUBLIC", null);
>              }
>              scanPubidLiteral(xString);
>              publicId = xString.toString();
>
>            }
>
>            if (publicId != null || fEntityScanner.skipString("SYSTEM")) {
>              if (publicId == null && !fEntityScanner.skipSpaces()) {
>                reportFatalError("SpaceRequiredAfterSYSTEM", null);
>              }
>              fEntityScanner.skipSpaces();
>              int quote = fEntityScanner.peekChar();
>              if (quote != '\'' && quote != '"') {
>                reportFatalError("QuoteRequiredInSystemID", null);
>              }
>              fEntityScanner.scanChar();
>              if (fEntityScanner.scanLiteral(quote, xString) != quote) {
>                XMLStringBuffer xsb = new XMLStringBuffer();
>                xsb.clear();
>                do {
>                  xsb.append(xString);
>                  int c = fEntityScanner.peekChar();
>                  if (XMLChar.isMarkup(c) || c == ']') {
>                    xsb.append((char) fEntityScanner.scanChar());
>                  }
>                } while (fEntityScanner.scanLiteral(quote, xString) !=
> quote);
>                xsb.append(xString);
>                xString = xsb;
>              }
>              if (!fEntityScanner.skipChar(quote)) {
>                reportFatalError("SystemIDUnterminated", null);
>              }
>            }
>            fEntityScanner.skipSpaces();
>          }
>
>          if (fEntityScanner.skipChar('[')) {
>            // has internal subset
>            while ((char) fEntityScanner.scanChar() != ']') {
>            }
>
>          }
>          fEntityScanner.skipSpaces();
>          if (!fEntityScanner.skipChar('>')) {
>            reportFatalError("DoctypedeclUnterminated",
>                new Object[] { doctypeName });
>          }
>          fEntityScanner.skipSpaces();
>          setScannerState(SCANNER_STATE_CONTENT);
>
>          // undo SCANNER_STATE_START_OF_MARKUP: {    fMarkupDepth++;
>
>          fMarkupDepth--;
>          return true;
>        }
>        return false;
>
>      }
>    }
> }
>
> I tested this code and it works as wanted.
>
> How do you think about this approach? Is the scanForDoctypeHook()
> correctly implemented
>
> It seems to me that it might be easier to use this parser later for our
> XSL conversions too.
>
> Greetings from Heidelberg
>
> Wulf
>
>
>
> Michael Glavassevich schrieb:
> > Hi Wulf,
> >
> > Wulf Berschin <bersc...@dosco.de> wrote on 09/16/2009 02:48:52 AM:
> >
> >  > Hi,
> >  >
> >  > for ease of editing we have a doctype declaration in each (file)
> >  > fragment. When I parse the full master (with resolving fragments) Xerces
> >  > throws a fatal error (Doctype not allowed in content) and goes in an
> >  > endless loop when setting this continue-after-fatal-error switch.
> >  >
> >  > How can make Xerces to ignore doctype declarations ocurring in content
> >  > (alt. in the header of file entities)?
> >
> > You can't. Xerces (or any conformant XML parser for that matter) will
> > not ignore or skip over any malformed / misplaced constructs in the
> > document. Parsers are required to report the fatal error. The
> > "continue-after-fatal-error" feature which allows Xerces to keep going
> > is unreliable and can lead to a catastrophic failure (e.g. NPE, infinite
> > loop, stack overflow, out of memory, etc...) if you turn it on. It's to
> > be used with extreme caution and should never be enabled in a finished
> > component / product.
> >
> > You either need to remove these DOCTYPEs from the files or filter them
> > out at a lower level (e.g. a wrapper InputStream which doesn't return
> > the DOCTYPE from read()).
> >
> >  > Wulf
> >  >
> >  > ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> >  > For additional commands, e-mail: j-users-h...@xerces.apache.org
> >
> > Thanks.
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrgla...@ca.ibm.com
> > E-mail: mrgla...@apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

RE: Doctype declarations in fragments

Reply via email to