On Mon, 17 Jan 2005, Alexander Blüm wrote: > this is also possible _without_ any modules, except maybe "strict". > > # this will replace the contents of each match in @get > foreach(@array){ > my @get = $_ =~ /<a href="(.*?)">/g; > }
What happens if the url has a doublequote followed by an angle bracket? It's not likely, but it can happen, and it can work. And if such a URL is discovered, this regex would break. What happens if the url isn't wrapped in quotes at all? This is much more likely, and again will work fine in browsers. But again, this regex won't find it at all. This kind of problem is why HTML (and XML) is really best processed using pre-written parser modules, such as HTML::SimpleLinkExtor. A parser has a much better shot at getting a proper view of the document than a simple regex pattern match. Yes, you can approach such problems using simple regular expressions, such as what we have here, and in many cases they'll work, and maybe even work faster than the parser version would. On the other hand, this approach is much less generally robust: minor changes that don't break the HTML may break the regex, so you end up having to constantly adjust it to handle all the special cases that come up over time. If you just parse it at the outset, such as with HTML::SimpleLinkExtor, then the code should be simple, robust, and useful for a long time. -- Chris Devers
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>