Johannes Nohl schrieb:
Dear list,
I player around with the units dom and xmlread. I liked them very
much. Now I thought I could parse websites with it. But they are
slightly different as far as I know. In xml everthing is within a node
while in HTML there are more then one value in a node. E.g.:
Johannes Nohl wrote:
Dear list, dear Michael!
There are multiple problems with HTML parsing: HTML is not a well-formed
XML document, because
- the tags are case insensitive (in XML they are case sensitive)
- Not all tags must be closed.
If the HTML is XHTML, then the DOM unit can be used to par
On Sun, 8 Jun 2008, Johannes Nohl wrote:
> Dear list, dear Michael!
>
> > There are multiple problems with HTML parsing: HTML is not a well-formed
> > XML document, because
> > - the tags are case insensitive (in XML they are case sensitive)
> > - Not all tags must be closed.
> > If the HTML is
Dear list, dear Michael!
> There are multiple problems with HTML parsing: HTML is not a well-formed
> XML document, because
> - the tags are case insensitive (in XML they are case sensitive)
> - Not all tags must be closed.
> If the HTML is XHTML, then the DOM unit can be used to parse it.
But ho
On Sat, 7 Jun 2008, Johannes Nohl wrote:
> Dear list,
>
> I player around with the units dom and xmlread. I liked them very
> much. Now I thought I could parse websites with it. But they are
> slightly different as far as I know. In xml everthing is within a node
> while in HTML there are more
Dear list,
I player around with the units dom and xmlread. I liked them very
much. Now I thought I could parse websites with it. But they are
slightly different as far as I know. In xml everthing is within a node
while in HTML there are more then one value in a node. E.g.:
possible XML:
asdf1