On Sat, 7 Jun 2008, Johannes Nohl wrote:
> Dear list, > > I player around with the units dom and xmlread. I liked them very > much. Now I thought I could parse websites with it. But they are > slightly different as far as I know. In xml everthing is within a node > while in HTML there are more then one value in a node. E.g.: There are multiple problems with HTML parsing: HTML is not a well-formed XML document, because - the tags are case insensitive (in XML they are case sensitive) - Not all tags must be closed. If the HTML is XHTML, then the DOM unit can be used to parse it. > > possible XML: > > <div> > asdf1 > <span>qwer1</span> > <span>qwer2</span> > </div> > > HTML: > <div> > asdf1 > <span>qwer1</span> > asdf2 > <span>qwer2</span> > asdf3 > </div> > > Using XML-Dom I can access Value "asdf1" only. I think second example > is not valid XML, or? No, it should be valid. if it wasn't valid XML, then the XMLRead unit would give an error. Michael. _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal