Re: Regexp question

Me Mon, 25 Jun 2001 18:59:53 -0700
> Drats - just when I got the regexp worked out too...
>  $_=~ m/(<A)(.*?)( <\/A>)/

Kudos for working out the regex that works given your
assumptions. If the web pages you will be parsing are
known to be constrained to the assumptions you've
established, then you're done.

But be aware that your regex will fail on some web pages.
And tightening up the regex to cope with the exceptions
rapidly becomes a futile exercise. Experienced perl
coders don't use regexes for most recursive and most
hierarchical parsing tasks, such as parsing html or xml.

This has been discussed an infinite number of times on
many perl lists for several years. Some recent experimental
regex extensions begin to break down this barrier to use of
regexes for recursive and hierarchical formats, but they are
still very much experimental, and are still pointless in the
particular cases of html and xml given the available cpan
modules.

But, as always, timtowtdi...

> > > [matching web page links]
> > > [using regexes]
> >
> > Don't use regexes. They aren't the right tools for the task.
> >
> > Use one of the cpan modules for parsing web pages.
> > Some are written specifically for pulling out links.
> >
> > http://search.cpan.org/
Re: Regexp question

Reply via email to