Hi Michael,

thank you for your response. I'll try your approach tomorrow...

Meanwhile I was playing a little bit with XNI and the Parser Configuration and finally got Xerces to ignore this, indeed misplaced, construct. What I did is:

---

public class FragmentDoctypeSkipParser
    extends AbstractSAXParser {

    public FragmentDoctypeSkipParser() {
        super(new FragmentDoctypeSkipConfiguration());
    }
}

---

public class FragmentDoctypeSkipConfiguration extends
    StandardParserConfiguration
{
  protected XMLDocumentScanner createDocumentScanner()
  {
    return new FragmentDoctypeSkipScannerImpl();
  }
}

---

public class FragmentDoctypeSkipScannerImpl extends
    org.apache.xerces.impl.XMLDocumentScannerImpl
{

  /** Creates a content dispatcher. */
  protected Dispatcher createContentDispatcher()
  {
    return new FDSContentDispatcher();
  }

  protected class FDSContentDispatcher extends ContentDispatcher
  {
    protected boolean scanForDoctypeHook() throws IOException, XNIException
    {
      XMLString xString = new XMLString();
      if (fEntityScanner.skipString("DOCTYPE")) {


        // spaces
        if (!fEntityScanner.skipSpaces()) {
          reportFatalError(
              "MSG_SPACE_REQUIRED_BEFORE_ROOT_ELEMENT_TYPE_IN_DOCTYPEDECL",
              null);
        }
        // root element name
        String doctypeName = fEntityScanner.scanName();

        String publicId = null;

        // external id
        if (fEntityScanner.skipSpaces()) {
          // scanExternalID(dtIds, false);

          if (fEntityScanner.skipString("PUBLIC")) {
            if (!fEntityScanner.skipSpaces()) {
              reportFatalError("SpaceRequiredAfterPUBLIC", null);
            }
            scanPubidLiteral(xString);
            publicId = xString.toString();

          }

          if (publicId != null || fEntityScanner.skipString("SYSTEM")) {
            if (publicId == null && !fEntityScanner.skipSpaces()) {
              reportFatalError("SpaceRequiredAfterSYSTEM", null);
            }
            fEntityScanner.skipSpaces();
            int quote = fEntityScanner.peekChar();
            if (quote != '\'' && quote != '"') {
              reportFatalError("QuoteRequiredInSystemID", null);
            }
            fEntityScanner.scanChar();
            if (fEntityScanner.scanLiteral(quote, xString) != quote) {
              XMLStringBuffer xsb = new XMLStringBuffer();
              xsb.clear();
              do {
                xsb.append(xString);
                int c = fEntityScanner.peekChar();
                if (XMLChar.isMarkup(c) || c == ']') {
                  xsb.append((char) fEntityScanner.scanChar());
                }
} while (fEntityScanner.scanLiteral(quote, xString) != quote);
              xsb.append(xString);
              xString = xsb;
            }
            if (!fEntityScanner.skipChar(quote)) {
              reportFatalError("SystemIDUnterminated", null);
            }
          }
          fEntityScanner.skipSpaces();
        }

        if (fEntityScanner.skipChar('[')) {
          // has internal subset
          while ((char) fEntityScanner.scanChar() != ']') {
          }

        }
        fEntityScanner.skipSpaces();
        if (!fEntityScanner.skipChar('>')) {
          reportFatalError("DoctypedeclUnterminated",
              new Object[] { doctypeName });
        }
        fEntityScanner.skipSpaces();
        setScannerState(SCANNER_STATE_CONTENT);

        // undo SCANNER_STATE_START_OF_MARKUP: {    fMarkupDepth++;

        fMarkupDepth--;
        return true;
      }
      return false;

    }
  }
}

I tested this code and it works as wanted.

How do you think about this approach? Is the scanForDoctypeHook() correctly implemented

It seems to me that it might be easier to use this parser later for our XSL conversions too.

Greetings from Heidelberg

Wulf



Michael Glavassevich schrieb:
Hi Wulf,

Wulf Berschin <bersc...@dosco.de> wrote on 09/16/2009 02:48:52 AM:

 > Hi,
 >
 > for ease of editing we have a doctype declaration in each (file)
 > fragment. When I parse the full master (with resolving fragments) Xerces
 > throws a fatal error (Doctype not allowed in content) and goes in an
 > endless loop when setting this continue-after-fatal-error switch.
 >
 > How can make Xerces to ignore doctype declarations ocurring in content
 > (alt. in the header of file entities)?

You can't. Xerces (or any conformant XML parser for that matter) will not ignore or skip over any malformed / misplaced constructs in the document. Parsers are required to report the fatal error. The "continue-after-fatal-error" feature which allows Xerces to keep going is unreliable and can lead to a catastrophic failure (e.g. NPE, infinite loop, stack overflow, out of memory, etc...) if you turn it on. It's to be used with extreme caution and should never be enabled in a finished component / product.

You either need to remove these DOCTYPEs from the files or filter them out at a lower level (e.g. a wrapper InputStream which doesn't return the DOCTYPE from read()).

 > Wulf
 >
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
 > For additional commands, e-mail: j-users-h...@xerces.apache.org

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to