[PHP] regex pattern for extracting URLs

Brad Fuller Fri, 23 Oct 2009 10:24:58 -0700

I'm looking for a regular expression to accomplish a specific task.

I'm hoping someone who's really good at regex patterns can lend a quick hand.


I need a regex pattern that will grab URLs out of HTML that have a
certain link text. (i.e. the word "Continue")

This is what I have so far but it does not work properly (If there are
other attributes in the <a> tag it returns them as part of the URL.)

    
preg_match_all('#<a[\s]+[^>]*href\s*=\s*([\"\']+)([^>]+?)(\1|>)>Continue</a>#i',
$html, $matches);

It needs to be able to extract the URL and disregard arbitrary
attributes in the HTML tag

Test it with the following examples:

<a href=/path/to/url.html>Continue</a>
<a href='/path/to/url.html'>Continue</a>
<a href="http://example.com/path/to/url.html"; class="link">Continue</a>
<a style="font-size: 12px" href="http://example.com/path/to/url.html";
onlick="someFunction('foo','bar')">Continue</a>

Please reply

Your help is much appreciated.

Thanks in advance,
Brad F.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] regex pattern for extracting URLs

Reply via email to