Re: Problem with parsing HTML

Yizhou Z. Sat, 12 May 2012 22:23:32 -0700

NekoHTML parser uses Xerces' HTML DOM implementation. And it seems that it
can always return the appropriate HTML DOM element objects for other types
of element nodes.  But for <input />, I found it returns an object of type
"org.apache.xerces.dom.ElementNSImpl". I wonder if this is a bug in the
version of Xerces that I use.


Thanks.

On Sun, May 13, 2012 at 5:34 AM, Michael Glavassevich
<mrgla...@ca.ibm.com>wrote:

> Have you tried setting the 'document-class-name' property [1] so that it
> points to Xerces' HTML DOM implementation?
>
> Thanks.
>
> [1]
> http://xerces.apache.org/xerces2-j/properties.html#dom.document-class-name
>
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrgla...@ca.ibm.com
> E-mail: mrgla...@apache.org
>
> "Yizhou Z." <westward.zh...@gmail.com> wrote on 12/05/2012 11:40:23 AM:
>
>
> > Hi. I am using NekoHTML to parse a piece of HTML code which includes
> > an input element:
>
> > <input type="password" name="pw" maxlength="20" class="password"
> > id="Password1" />
> >
> > My program for parsing HTML is below.
> >
> > DOMParser parser = new DOMParser();
> > parser.setProperty("
> http://cyberneko.org/html/properties/default-encoding
> > ", "UTF-8");
> > parser.setProperty("http://cyberneko.org/html/properties/filters";,
> >   new XMLDocumentFilter[] { new DefaultFilter() {
> >     public void startElement(QName element, XMLAttributes attrs,
> > Augmentations augs)
> >     throws XNIException {
> >       element.uri = null;
> >       super.startElement(element, attrs, augs);
> >     }
> > } });
> > BufferedReader in = new BufferedReader(new FileReader("./test.html"));
> > parser.parse(new InputSource(in));
> > HTMLDocument d = (HTMLDocument) parser.getDocument();
> > System.out.println(d.getElementById("Password1").getClass());
> >
> > The print out of the above program is "class
> > org.apache.xerces.dom.ElementNSImpl" rather than "class
> > org.apache.html.dom.HTMLInputElementImpl", which puzzles me. Is
> > there anything I went wrong with?
> >
> > Thanks!
>

Re: Problem with parsing HTML

Reply via email to