On Tue, 2004-02-24 at 10:49, Henry Grech-Cini wrote: > http://www.alpha-geek.com/2003/12/31/do_not_do_not_parse_html_with_regexs.html > > Do we all agree or should I keep trying?
The important thing to keep in mind here is to use the right tool for the job. If you are parsing an HTML document looking for tags, attributes, etc. I do recommend using domxml/simplexml/some XML parsing tool to get your job done quickly and cleanly. However, if you have a very specific need to extract some text from a string then you can probably get away with regular expressions. The big catch with regexp is that it has a very low reuse value. Generally regexps are difficult to read and rarely will you just copy and pate a regular expression from one piece of code to another. If your regexp is growing beyond one line and is taking a long time to process then it is time to move on. Additionally, regular expressions are not good at providing context. It just so happens that HTML documents are just text documents so if you can parse the text to get what you need great. However, if you want to move through the elements and attributes, you want something more powerful, like XPath or XQuery. (ie. you want to find the third fieldset child of the body element that has an attribute set to "foo") As a side note, that article has a link to a similar one that lists a regexp based XML parser as the only PHP solution. :) -- Adam Bregenzer [EMAIL PROTECTED] http://adam.bregenzer.net/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php