Hi Michael,
thank you for your response. I'll try your approach tomorrow...
Meanwhile I was playing a little bit with XNI and the Parser
Configuration and finally got Xerces to ignore this, indeed misplaced,
construct. What I did is:
---
public class FragmentDoctypeSkipParser
extends AbstractSAXParser {
public FragmentDoctypeSkipParser() {
super(new FragmentDoctypeSkipConfiguration());
}
}
---
public class FragmentDoctypeSkipConfiguration extends
StandardParserConfiguration
{
protected XMLDocumentScanner createDocumentScanner()
{
return new FragmentDoctypeSkipScannerImpl();
}
}
---
public class FragmentDoctypeSkipScannerImpl extends
org.apache.xerces.impl.XMLDocumentScannerImpl
{
/** Creates a content dispatcher. */
protected Dispatcher createContentDispatcher()
{
return new FDSContentDispatcher();
}
protected class FDSContentDispatcher extends ContentDispatcher
{
protected boolean scanForDoctypeHook() throws IOException, XNIException
{
XMLString xString = new XMLString();
if (fEntityScanner.skipString("DOCTYPE")) {
// spaces
if (!fEntityScanner.skipSpaces()) {
reportFatalError(
"MSG_SPACE_REQUIRED_BEFORE_ROOT_ELEMENT_TYPE_IN_DOCTYPEDECL",
null);
}
// root element name
String doctypeName = fEntityScanner.scanName();
String publicId = null;
// external id
if (fEntityScanner.skipSpaces()) {
// scanExternalID(dtIds, false);
if (fEntityScanner.skipString("PUBLIC")) {
if (!fEntityScanner.skipSpaces()) {
reportFatalError("SpaceRequiredAfterPUBLIC", null);
}
scanPubidLiteral(xString);
publicId = xString.toString();
}
if (publicId != null || fEntityScanner.skipString("SYSTEM")) {
if (publicId == null && !fEntityScanner.skipSpaces()) {
reportFatalError("SpaceRequiredAfterSYSTEM", null);
}
fEntityScanner.skipSpaces();
int quote = fEntityScanner.peekChar();
if (quote != '\'' && quote != '"') {
reportFatalError("QuoteRequiredInSystemID", null);
}
fEntityScanner.scanChar();
if (fEntityScanner.scanLiteral(quote, xString) != quote) {
XMLStringBuffer xsb = new XMLStringBuffer();
xsb.clear();
do {
xsb.append(xString);
int c = fEntityScanner.peekChar();
if (XMLChar.isMarkup(c) || c == ']') {
xsb.append((char) fEntityScanner.scanChar());
}
} while (fEntityScanner.scanLiteral(quote, xString) !=
quote);
xsb.append(xString);
xString = xsb;
}
if (!fEntityScanner.skipChar(quote)) {
reportFatalError("SystemIDUnterminated", null);
}
}
fEntityScanner.skipSpaces();
}
if (fEntityScanner.skipChar('[')) {
// has internal subset
while ((char) fEntityScanner.scanChar() != ']') {
}
}
fEntityScanner.skipSpaces();
if (!fEntityScanner.skipChar('>')) {
reportFatalError("DoctypedeclUnterminated",
new Object[] { doctypeName });
}
fEntityScanner.skipSpaces();
setScannerState(SCANNER_STATE_CONTENT);
// undo SCANNER_STATE_START_OF_MARKUP: { fMarkupDepth++;
fMarkupDepth--;
return true;
}
return false;
}
}
}
I tested this code and it works as wanted.
How do you think about this approach? Is the scanForDoctypeHook()
correctly implemented
It seems to me that it might be easier to use this parser later for our
XSL conversions too.
Greetings from Heidelberg
Wulf
Michael Glavassevich schrieb:
Hi Wulf,
Wulf Berschin <bersc...@dosco.de> wrote on 09/16/2009 02:48:52 AM:
> Hi,
>
> for ease of editing we have a doctype declaration in each (file)
> fragment. When I parse the full master (with resolving fragments) Xerces
> throws a fatal error (Doctype not allowed in content) and goes in an
> endless loop when setting this continue-after-fatal-error switch.
>
> How can make Xerces to ignore doctype declarations ocurring in content
> (alt. in the header of file entities)?
You can't. Xerces (or any conformant XML parser for that matter) will
not ignore or skip over any malformed / misplaced constructs in the
document. Parsers are required to report the fatal error. The
"continue-after-fatal-error" feature which allows Xerces to keep going
is unreliable and can lead to a catastrophic failure (e.g. NPE, infinite
loop, stack overflow, out of memory, etc...) if you turn it on. It's to
be used with extreme caution and should never be enabled in a finished
component / product.
You either need to remove these DOCTYPEs from the files or filter them
out at a lower level (e.g. a wrapper InputStream which doesn't return
the DOCTYPE from read()).
> Wulf
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org