Perhaps you already know... NekoHTML is maintained by another community out in SourceForge [1].
Thanks. [1] http://sourceforge.net/tracker/?group_id=195122&atid=952178 Michael Glavassevich XML Technologies and WAS Development IBM Toronto Lab E-mail: mrgla...@ca.ibm.com E-mail: mrgla...@apache.org "Yizhou Z." <westward.zh...@gmail.com> wrote on 13/05/2012 11:13:00 PM: > Just tried out parsing some other HTML files, and found Xerces > worked well for the "input" tags in these HTML files. The previous > problem seems to have something to do with NekoHTML's parser. > On Sun, May 13, 2012 at 1:22 PM, Yizhou Z. <westward.zh...@gmail.com> wrote: > NekoHTML parser uses Xerces' HTML DOM implementation. And it seems > that it can always return the appropriate HTML DOM element objects > for other types of element nodes. But for <input />, I found it > returns an object of type "org.apache.xerces.dom.ElementNSImpl". I > wonder if this is a bug in the version of Xerces that I use. > > Thanks. > > On Sun, May 13, 2012 at 5:34 AM, Michael Glavassevich <mrgla...@ca.ibm.com > > wrote: > Have you tried setting the 'document-class-name' property [1] so > that it points to Xerces' HTML DOM implementation? > > Thanks. > > [1] http://xerces.apache.org/xerces2-j/properties.html#dom.document-class-name > > Michael Glavassevich > XML Technologies and WAS Development > IBM Toronto Lab > E-mail: mrgla...@ca.ibm.com > E-mail: mrgla...@apache.org > > "Yizhou Z." <westward.zh...@gmail.com> wrote on 12/05/2012 11:40:23 AM: > > > > Hi. I am using NekoHTML to parse a piece of HTML code which includes > > an input element: > > > <input type="password" name="pw" maxlength="20" class="password" > > id="Password1" /> > > > > My program for parsing HTML is below. > > > > DOMParser parser = new DOMParser(); > > parser.setProperty(" http://cyberneko.org/html/properties/default-encoding > > ", "UTF-8"); > > parser.setProperty("http://cyberneko.org/html/properties/filters", > > new XMLDocumentFilter[] { new DefaultFilter() { > > public void startElement(QName element, XMLAttributes attrs, > > Augmentations augs) > > throws XNIException { > > element.uri = null; > > super.startElement(element, attrs, augs); > > } > > } }); > > BufferedReader in = new BufferedReader(new FileReader("./test.html")); > > parser.parse(new InputSource(in)); > > HTMLDocument d = (HTMLDocument) parser.getDocument(); > > System.out.println(d.getElementById("Password1").getClass()); > > > > The print out of the above program is "class > > org.apache.xerces.dom.ElementNSImpl" rather than "class > > org.apache.html.dom.HTMLInputElementImpl", which puzzles me. Is > > there anything I went wrong with? > > > > Thanks!