On Tue, 2004-02-24 at 10:49, Henry Grech-Cini wrote:
> http://www.alpha-geek.com/2003/12/31/do_not_do_not_parse_html_with_regexs.html
> 
> Do we all agree or should I keep trying?

The important thing to keep in mind here is to use the right tool for
the job.  If you are parsing an HTML document looking for tags,
attributes, etc. I do recommend using domxml/simplexml/some XML parsing
tool to get your job done quickly and cleanly.  However, if you have a
very specific need to extract some text from a string then you can
probably get away with regular expressions.  The big catch with regexp
is that it has a very low reuse value.  Generally regexps are difficult
to read and rarely will you just copy and pate a regular expression from
one piece of code to another.  If your regexp is growing beyond one line
and is taking a long time to process then it is time to move on.

Additionally, regular expressions are not good at providing context.  It
just so happens that HTML documents are just text documents so if you
can parse the text to get what you need great.  However, if you want to
move through the elements and attributes, you want something more
powerful, like XPath or XQuery. (ie. you want to find the third fieldset
child of the body element that has an attribute set to "foo")

As a side note, that article has a link to a similar one that lists a
regexp based XML parser as the only PHP solution. :)

-- 
Adam Bregenzer
[EMAIL PROTECTED]
http://adam.bregenzer.net/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to