Re: Doctype declarations in fragments

Wulf Berschin Wed, 16 Sep 2009 09:17:37 -0700

Hi Michael,

thank you for your response. I'll try your approach tomorrow...

Meanwhile I was playing a little bit with XNI and the ParserConfiguration and finally got Xerces to ignore this, indeed misplaced,construct. What I did is:


---

public class FragmentDoctypeSkipParser
    extends AbstractSAXParser {

    public FragmentDoctypeSkipParser() {
        super(new FragmentDoctypeSkipConfiguration());
    }
}

---

public class FragmentDoctypeSkipConfiguration extends
    StandardParserConfiguration
{
  protected XMLDocumentScanner createDocumentScanner()
  {
    return new FragmentDoctypeSkipScannerImpl();
  }
}

---

public class FragmentDoctypeSkipScannerImpl extends
    org.apache.xerces.impl.XMLDocumentScannerImpl
{

  /** Creates a content dispatcher. */
  protected Dispatcher createContentDispatcher()
  {
    return new FDSContentDispatcher();
  }

  protected class FDSContentDispatcher extends ContentDispatcher
  {
    protected boolean scanForDoctypeHook() throws IOException, XNIException
    {
      XMLString xString = new XMLString();
      if (fEntityScanner.skipString("DOCTYPE")) {


        // spaces
        if (!fEntityScanner.skipSpaces()) {
          reportFatalError(
              "MSG_SPACE_REQUIRED_BEFORE_ROOT_ELEMENT_TYPE_IN_DOCTYPEDECL",
              null);
        }
        // root element name
        String doctypeName = fEntityScanner.scanName();

        String publicId = null;

        // external id
        if (fEntityScanner.skipSpaces()) {
          // scanExternalID(dtIds, false);

          if (fEntityScanner.skipString("PUBLIC")) {
            if (!fEntityScanner.skipSpaces()) {
              reportFatalError("SpaceRequiredAfterPUBLIC", null);
            }
            scanPubidLiteral(xString);
            publicId = xString.toString();

          }

          if (publicId != null || fEntityScanner.skipString("SYSTEM")) {
            if (publicId == null && !fEntityScanner.skipSpaces()) {
              reportFatalError("SpaceRequiredAfterSYSTEM", null);
            }
            fEntityScanner.skipSpaces();
            int quote = fEntityScanner.peekChar();
            if (quote != '\'' && quote != '"') {
              reportFatalError("QuoteRequiredInSystemID", null);
            }
            fEntityScanner.scanChar();
            if (fEntityScanner.scanLiteral(quote, xString) != quote) {
              XMLStringBuffer xsb = new XMLStringBuffer();
              xsb.clear();
              do {
                xsb.append(xString);
                int c = fEntityScanner.peekChar();
                if (XMLChar.isMarkup(c) || c == ']') {
                  xsb.append((char) fEntityScanner.scanChar());
                }

} while (fEntityScanner.scanLiteral(quote, xString) !=quote);

              xsb.append(xString);
              xString = xsb;
            }
            if (!fEntityScanner.skipChar(quote)) {
              reportFatalError("SystemIDUnterminated", null);
            }
          }
          fEntityScanner.skipSpaces();
        }

        if (fEntityScanner.skipChar('[')) {
          // has internal subset
          while ((char) fEntityScanner.scanChar() != ']') {
          }

        }
        fEntityScanner.skipSpaces();
        if (!fEntityScanner.skipChar('>')) {
          reportFatalError("DoctypedeclUnterminated",
              new Object[] { doctypeName });
        }
        fEntityScanner.skipSpaces();
        setScannerState(SCANNER_STATE_CONTENT);

        // undo SCANNER_STATE_START_OF_MARKUP: {    fMarkupDepth++;

        fMarkupDepth--;
        return true;
      }
      return false;

    }
  }
}

I tested this code and it works as wanted.

How do you think about this approach? Is the scanForDoctypeHook()correctly implemented

It seems to me that it might be easier to use this parser later for ourXSL conversions too.


Greetings from Heidelberg

Wulf



Michael Glavassevich schrieb:

Hi Wulf,

Wulf Berschin <[email protected]> wrote on 09/16/2009 02:48:52 AM:

 > Hi,
 >
 > for ease of editing we have a doctype declaration in each (file)
 > fragment. When I parse the full master (with resolving fragments) Xerces
 > throws a fatal error (Doctype not allowed in content) and goes in an
 > endless loop when setting this continue-after-fatal-error switch.
 >
 > How can make Xerces to ignore doctype declarations ocurring in content
 > (alt. in the header of file entities)?
You can't. Xerces (or any conformant XML parser for that matter) willnot ignore or skip over any malformed / misplaced constructs in thedocument. Parsers are required to report the fatal error. The"continue-after-fatal-error" feature which allows Xerces to keep goingis unreliable and can lead to a catastrophic failure (e.g. NPE, infiniteloop, stack overflow, out of memory, etc...) if you turn it on. It's tobe used with extreme caution and should never be enabled in a finishedcomponent / product.
You either need to remove these DOCTYPEs from the files or filter themout at a lower level (e.g. a wrapper InputStream which doesn't returnthe DOCTYPE from read()).
 > Wulf
 >
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: [email protected]
 > For additional commands, e-mail: [email protected]

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [email protected]
E-mail: [email protected]





---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Doctype declarations in fragments

Reply via email to