Re: Problem with parsing HTML

Michael Glavassevich Sat, 12 May 2012 14:35:43 -0700

Have you tried setting the 'document-class-name' property [1] so that it 
points to Xerces' HTML DOM implementation?


Thanks.

[1] 
http://xerces.apache.org/xerces2-j/properties.html#dom.document-class-name

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

"Yizhou Z." <westward.zh...@gmail.com> wrote on 12/05/2012 11:40:23 AM:

> Hi. I am using NekoHTML to parse a piece of HTML code which includes
> an input element:
> <input type="password" name="pw" maxlength="20" class="password" 
> id="Password1" /> 
> 
> My program for parsing HTML is below.
> 
> DOMParser parser = new DOMParser();
> parser.setProperty("
http://cyberneko.org/html/properties/default-encoding
> ", "UTF-8");
> parser.setProperty("http://cyberneko.org/html/properties/filters";,
>   new XMLDocumentFilter[] { new DefaultFilter() {
>     public void startElement(QName element, XMLAttributes attrs, 
> Augmentations augs)
>     throws XNIException {
>       element.uri = null;
>       super.startElement(element, attrs, augs);
>     }
> } });
> BufferedReader in = new BufferedReader(new FileReader("./test.html"));
> parser.parse(new InputSource(in));
> HTMLDocument d = (HTMLDocument) parser.getDocument();
> System.out.println(d.getElementById("Password1").getClass());
> 
> The print out of the above program is "class 
> org.apache.xerces.dom.ElementNSImpl" rather than "class 
> org.apache.html.dom.HTMLInputElementImpl", which puzzles me. Is 
> there anything I went wrong with?
> 
> Thanks!

Re: Problem with parsing HTML

Reply via email to