Have you tried setting the 'document-class-name' property [1] so that it points to Xerces' HTML DOM implementation?
Thanks. [1] http://xerces.apache.org/xerces2-j/properties.html#dom.document-class-name Michael Glavassevich XML Technologies and WAS Development IBM Toronto Lab E-mail: mrgla...@ca.ibm.com E-mail: mrgla...@apache.org "Yizhou Z." <westward.zh...@gmail.com> wrote on 12/05/2012 11:40:23 AM: > Hi. I am using NekoHTML to parse a piece of HTML code which includes > an input element: > <input type="password" name="pw" maxlength="20" class="password" > id="Password1" /> > > My program for parsing HTML is below. > > DOMParser parser = new DOMParser(); > parser.setProperty(" http://cyberneko.org/html/properties/default-encoding > ", "UTF-8"); > parser.setProperty("http://cyberneko.org/html/properties/filters", > new XMLDocumentFilter[] { new DefaultFilter() { > public void startElement(QName element, XMLAttributes attrs, > Augmentations augs) > throws XNIException { > element.uri = null; > super.startElement(element, attrs, augs); > } > } }); > BufferedReader in = new BufferedReader(new FileReader("./test.html")); > parser.parse(new InputSource(in)); > HTMLDocument d = (HTMLDocument) parser.getDocument(); > System.out.println(d.getElementById("Password1").getClass()); > > The print out of the above program is "class > org.apache.xerces.dom.ElementNSImpl" rather than "class > org.apache.html.dom.HTMLInputElementImpl", which puzzles me. Is > there anything I went wrong with? > > Thanks!