I see,

so looks like I wasn't doing anything wrong after all. (X)HTML support was removed long ago... which is a shame.

Thanks for the info Michael!

Daniel

Michael Glavassevich wrote:
Hi Daniel,

The HTML DOM implementation in Xerces is ancient. It implements DOM Level 1 HTML [1][2] which was intended for use with HTML 4.0 documents only. It does not recognize XHTML [3].

[1] http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/level-one-html.html
[2] http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML-20030109/html.html#ID-5353782642
[3] http://issues.apache.org/jira/browse/XERCESJ-890

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

Daniel Farinha <[EMAIL PROTECTED]> wrote on 03/27/2006 09:56:03 AM:

Hi all,

I'm parsing an XHTML document using Xerces.
This is the code that I'm using to parse the document:

String xhtmlSource = "<the xhtml source>";
DOMParser parser = new DOMParser();
parser.setProperty("
http://apache.org/xml/properties/dom/document-class-name
","org.apache.html.dom.HTMLDocumentImpl");
InputSource iSource = new InputSource(new StringReader(xhtmlSource));
parser.parse(iSource);
HTMLDocumentImpl document = (HTMLDocumentImpl)parser.getDocument();

The parsing seems to work, except when I query the HTMLDocumentImpl most

nodes are of type |ElementNSImpl |rather than the actual apache HTML DOM

implementation classes. (For example, I can't even do a document.getBody() - it returns null. Instead I have to walk the XML DOM

looking for the 'body' node).

This behaviour is described in NekoHTML's 'Requirements and Limitations'

section at http://people.apache.org/~andyc/neko/doc/html/index.html

I'm not using NekoHTML, and I'm currently using Xerces 2.8.0. I did try various versions of Xerces but to no avail.

I'm having to carry on working with plain nodes, but I'd much rather work with the HTML DOM.
Can anyone give any hints?

Thanks in advance.

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--
Daniel Farinha
Software Developer
Genial Genetic Solutions Ltd
The Heath Business & Technical Park
Runcorn, Cheshire WA7 4QX
Tel: +44 (0)870 757 9300
Fax: +44 (0)870 757 9301
Email: [EMAIL PROTECTED]
Website: www.genialgenetics.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to