DOMParser ignoring HTML DOM

Daniel Farinha Mon, 27 Mar 2006 06:56:29 -0800

Hi all,

I'm parsing an XHTML document using Xerces.
This is the code that I'm using to parse the document:


String xhtmlSource = "<the xhtml source>";
DOMParser parser = new DOMParser();
parser.setProperty("http://apache.org/xml/properties/dom/document-class-name","org.apache.html.dom.HTMLDocumentImpl";);
InputSource iSource = new InputSource(new StringReader(xhtmlSource));
parser.parse(iSource);
HTMLDocumentImpl document = (HTMLDocumentImpl)parser.getDocument();

The parsing seems to work, except when I query the HTMLDocumentImpl mostnodes are of type |ElementNSImpl |rather than the actual apache HTML DOMimplementation classes. (For example, I can't even do adocument.getBody() - it returns null. Instead I have to walk the XML DOMlooking for the 'body' node).

This behaviour is described in NekoHTML's 'Requirements and Limitations'section at http://people.apache.org/~andyc/neko/doc/html/index.html

I'm not using NekoHTML, and I'm currently using Xerces 2.8.0. I did tryvarious versions of Xerces but to no avail.

I'm having to carry on working with plain nodes, but I'd much ratherwork with the HTML DOM.

Can anyone give any hints?

Thanks in advance.

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DOMParser ignoring HTML DOM

Reply via email to