Try using URI to figure out the absolute URL.

use URI;

# the base is the *current absolute page*
my $base_url = 'http://foo.com/documents/help.html';

print URI->new_abs('doc1.html', $base_url), "\n";
print URI->new_abs('./doc2.html', $base_url), "\n";
print URI->new_abs('../documents/doc3.html', $base_url), "\n";
print URI->new_abs('http://somewhere.com/', $base_url), "\n";

<<< SCRIPT OUTPUT >>>
http://foo.com/documents/doc1.html
http://foo.com/documents/doc2.html
http://foo.com/documents/doc3.html
http://somewhere.com/

So your regex *might* look like this (untested)...

my $base = 'http://foo.com/documents/help.html';
$html_code =~ s/href="(.*?)"/'href="' . URI->new_abs($1, $base) . '"'/seg;

Rob


-----Original Message-----
From: Dan Muey [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 04, 2003 3:26 PM
To: [EMAIL PROTECTED]
Subject: Modify links


I'm trying to work out a regex that will do this:

Take an entire page's html: my $html_code; # all lines in thes one variable

And make any href's that are relative absolute by prepending $url into them:
        $url = "http://myclonesite.com";;
        make <a href="./documents/help.hml"> into <a
href="http://myclonesite.com/documents/help.html";>
        $html_code =~ s/href\=\"\.?\/?(.*)\"/href\=\"$url\/"/ig;
        the rpobolem with this is it prepends $url to absolute url's also

I need to say :
Put $url in front of relative urls (make ./foo /foo or foo $url/foo and
../foo would have to be treated differently, ignored for now) in href if
href does not start with https?://

Any ideas?

TIA
Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to