Re: [fpc-pascal] XML DOM and HTML

Michael Van Canneyt Sat, 07 Jun 2008 16:02:01 -0700


On Sat, 7 Jun 2008, Johannes Nohl wrote:


> Dear list,
> 
> I player around with the units dom and xmlread. I liked them very
> much. Now I thought I could parse websites with it. But they are
> slightly different as far as I know. In xml everthing is within a node
> while in HTML there are more then one value in a node. E.g.:

There are multiple problems with HTML parsing: HTML is not a well-formed
XML document, because 
- the tags are case insensitive (in XML they are case sensitive)
- Not all tags must be closed.
If the HTML is XHTML, then the DOM unit can be used to parse it.

> 
> possible XML:
> 
> <div>
>  asdf1
>  <span>qwer1</span>
>  <span>qwer2</span>
> </div>
> 
> HTML:
> <div>
>  asdf1
>  <span>qwer1</span>
>  asdf2
>  <span>qwer2</span>
>  asdf3
> </div>
> 
> Using XML-Dom I can access Value "asdf1" only. I think second example
> is not valid XML, or?

No, it should be valid. if it wasn't valid XML, then the XMLRead unit would
give an error.

Michael.
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] XML DOM and HTML

Reply via email to