Re: [fpc-pascal] XML DOM and HTML

Lee Jenkins Tue, 17 Jun 2008 14:17:44 -0700

Johannes Nohl wrote:

Dear list, dear Michael!

There are multiple problems with HTML parsing: HTML is not a well-formed
XML document, because
- the tags are case insensitive (in XML they are case sensitive)
- Not all tags must be closed.
If the HTML is XHTML, then the DOM unit can be used to parse it.


But how do I retrieve more than the first part of the node's value?

If I read in:
 <div>
  asdf1
  <span>qwer1</span>
  asdf2
  <img src="" />
  asdf3
 </div>

FindNode('dvi').NodeValue returns "asdf1". But not asdf2 and asdf3.
Isn't the example above valid XHTML?

If were going to parse web pages I would probably opt to use RegEx. There isregex included with fpc I believe, but I tend to use this one since itscompatible with fpc and delphi:


http://regexpstudio.com/TRegExpr/TRegExpr.html

--

Warm Regards,

Lee

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] XML DOM and HTML

Reply via email to