Here's another case that shows up often in html, but is illegal in xml, that
I would need to parse:  meta tags, <p> tags, <hr> tags, and other
"singletons".

        <HEAD>
        <META HTTP-EQUIV="Content-Type" CONTENT="text/html">
        </HEAD>

xml_parse would give an error, because the HEAD block is being closed with a
still-open META "block".


Nate

-----Original Message-----
From: Nathaniel Hekman [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, March 07, 2001 9:57 AM
To: '[EMAIL PROTECTED]'
Subject: [PHP] parsing html / xml


I'd like to parse a html file in much the same way the xml parser works.  Ie
calling a method for every tag encountered and so on.  The xml parsing
methods don't seem to be forgiving enough for much of the html that's out
there.  For example, many html files have tags like this:

        <TABLE border=0>

but xml_parse() will choke on it because there are no quotes around the "0".
Also html tags are, in practice, case insensitive, so this is found in many
html documents:

        <B>This is bold</b>

but xml_parse() doesn't like it because it expects the opening and closing
tags to be same-case.

Are there other functions or libraries I'm not aware of that help in parsing
html?  Or some options in xml_parse to get by these problems?

Thanks in advance.


Nate

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to