Re: Doctype declarations in fragments

Michael Glavassevich Wed, 16 Sep 2009 10:28:33 -0700

Hi Wulf,

Haven't looked at the specifics of what you did but I'm never fond of any
solution that involves extending or hooking into Xerces' internals. I'm not
referring to XNI itself (which is a stable Xerces API) but rather the
internal implementation classes you've chosen to use / extend. Your code
could break at any time in the future if one of us modifies or removes any
of those classes / methods.


Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

Wulf Berschin <bersc...@dosco.de> wrote on 09/16/2009 12:16:11 PM:

> Hi Michael,
>
> thank you for your response. I'll try your approach tomorrow...
>
> Meanwhile I was playing a little bit with XNI and the Parser
> Configuration and finally got Xerces to ignore this, indeed misplaced,
> construct. What I did is:
>
> ---
>
> public class FragmentDoctypeSkipParser
>      extends AbstractSAXParser {
>
>      public FragmentDoctypeSkipParser() {
>          super(new FragmentDoctypeSkipConfiguration());
>      }
> }
>
> ---
>
> public class FragmentDoctypeSkipConfiguration extends
>      StandardParserConfiguration
> {
>    protected XMLDocumentScanner createDocumentScanner()
>    {
>      return new FragmentDoctypeSkipScannerImpl();
>    }
> }
>
> ---
>
> public class FragmentDoctypeSkipScannerImpl extends
>      org.apache.xerces.impl.XMLDocumentScannerImpl
> {
>
>    /** Creates a content dispatcher. */
>    protected Dispatcher createContentDispatcher()
>    {
>      return new FDSContentDispatcher();
>    }
>
>    protected class FDSContentDispatcher extends ContentDispatcher
>    {
>      protected boolean scanForDoctypeHook() throws IOException,
XNIException
>      {
>        XMLString xString = new XMLString();
>        if (fEntityScanner.skipString("DOCTYPE")) {
>
>
>          // spaces
>          if (!fEntityScanner.skipSpaces()) {
>            reportFatalError(
>
"MSG_SPACE_REQUIRED_BEFORE_ROOT_ELEMENT_TYPE_IN_DOCTYPEDECL",
>                null);
>          }
>          // root element name
>          String doctypeName = fEntityScanner.scanName();
>
>          String publicId = null;
>
>          // external id
>          if (fEntityScanner.skipSpaces()) {
>            // scanExternalID(dtIds, false);
>
>            if (fEntityScanner.skipString("PUBLIC")) {
>              if (!fEntityScanner.skipSpaces()) {
>                reportFatalError("SpaceRequiredAfterPUBLIC", null);
>              }
>              scanPubidLiteral(xString);
>              publicId = xString.toString();
>
>            }
>
>            if (publicId != null || fEntityScanner.skipString("SYSTEM")) {
>              if (publicId == null && !fEntityScanner.skipSpaces()) {
>                reportFatalError("SpaceRequiredAfterSYSTEM", null);
>              }
>              fEntityScanner.skipSpaces();
>              int quote = fEntityScanner.peekChar();
>              if (quote != '\'' && quote != '"') {
>                reportFatalError("QuoteRequiredInSystemID", null);
>              }
>              fEntityScanner.scanChar();
>              if (fEntityScanner.scanLiteral(quote, xString) != quote) {
>                XMLStringBuffer xsb = new XMLStringBuffer();
>                xsb.clear();
>                do {
>                  xsb.append(xString);
>                  int c = fEntityScanner.peekChar();
>                  if (XMLChar.isMarkup(c) || c == ']') {
>                    xsb.append((char) fEntityScanner.scanChar());
>                  }
>                } while (fEntityScanner.scanLiteral(quote, xString) !=
> quote);
>                xsb.append(xString);
>                xString = xsb;
>              }
>              if (!fEntityScanner.skipChar(quote)) {
>                reportFatalError("SystemIDUnterminated", null);
>              }
>            }
>            fEntityScanner.skipSpaces();
>          }
>
>          if (fEntityScanner.skipChar('[')) {
>            // has internal subset
>            while ((char) fEntityScanner.scanChar() != ']') {
>            }
>
>          }
>          fEntityScanner.skipSpaces();
>          if (!fEntityScanner.skipChar('>')) {
>            reportFatalError("DoctypedeclUnterminated",
>                new Object[] { doctypeName });
>          }
>          fEntityScanner.skipSpaces();
>          setScannerState(SCANNER_STATE_CONTENT);
>
>          // undo SCANNER_STATE_START_OF_MARKUP: {    fMarkupDepth++;
>
>          fMarkupDepth--;
>          return true;
>        }
>        return false;
>
>      }
>    }
> }
>
> I tested this code and it works as wanted.
>
> How do you think about this approach? Is the scanForDoctypeHook()
> correctly implemented
>
> It seems to me that it might be easier to use this parser later for our
> XSL conversions too.
>
> Greetings from Heidelberg
>
> Wulf
>
>
>
> Michael Glavassevich schrieb:
> > Hi Wulf,
> >
> > Wulf Berschin <bersc...@dosco.de> wrote on 09/16/2009 02:48:52 AM:
> >
> >  > Hi,
> >  >
> >  > for ease of editing we have a doctype declaration in each (file)
> >  > fragment. When I parse the full master (with resolving fragments)
Xerces
> >  > throws a fatal error (Doctype not allowed in content) and goes in an
> >  > endless loop when setting this continue-after-fatal-error switch.
> >  >
> >  > How can make Xerces to ignore doctype declarations ocurring in
content
> >  > (alt. in the header of file entities)?
> >
> > You can't. Xerces (or any conformant XML parser for that matter) will
> > not ignore or skip over any malformed / misplaced constructs in the
> > document. Parsers are required to report the fatal error. The
> > "continue-after-fatal-error" feature which allows Xerces to keep going
> > is unreliable and can lead to a catastrophic failure (e.g. NPE,
infinite
> > loop, stack overflow, out of memory, etc...) if you turn it on. It's to

> > be used with extreme caution and should never be enabled in a finished
> > component / product.
> >
> > You either need to remove these DOCTYPEs from the files or filter them
> > out at a lower level (e.g. a wrapper InputStream which doesn't return
> > the DOCTYPE from read()).
> >
> >  > Wulf
> >  >
> >  >
---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> >  > For additional commands, e-mail: j-users-h...@xerces.apache.org
> >
> > Thanks.
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrgla...@ca.ibm.com
> > E-mail: mrgla...@apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org

Re: Doctype declarations in fragments

Reply via email to